Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Author Posts

123 Articles
article-image-how-to-use-tls-securely-in-flask-applications
Dr. Paul Duplys, Dr. Roland Schmitz
26 Nov 2024
10 min read
Save for later

How to Use TLS Securely in Flask Applications

Dr. Paul Duplys, Dr. Roland Schmitz
26 Nov 2024
10 min read
IntroductionThere is a lot of documentation and some books you can find on the TLS protocol and the theory behind it. But what do you do when you actually have to deploy TLS in a real-world application? For example, what do you do to enable TLS for your web application written in Flask? Securing web applications is more important than ever, and Transport Layer Security (TLS) plays a vital role in protecting data in transit. For developers using Flask, a popular Python web framework, implementing TLS might seem daunting. This article provides a practical guide to setting up TLS for Flask applications, focusing on Nginx as a reverse proxy, secure configurations, and avoiding common pitfalls. Whether you're a beginner or an experienced developer, you'll find actionable insights to ensure your application is both performant and secure.TLS and FlaskIf you want to write a web application, chances are you'll end up using Flask, a popular microweb framework written in Python. Flask is well-known for its simplicity, flexibility, performance, and beginner-friendly learning curve. Flask is a WSGI (Web Server Gateway Interface) application, where WSGI is a standard interface between web servers and Python web applications or frameworks. Finally, a WSGI application is a Python callable that accepts two arguments environment and start_response, and returns an iterable, allowing it to handle HTTP requests and responses in a standardized manner. In Flask, a WSGI server runs the application, converting incoming HTTP requests to the standard WSGI environment, and converting outgoing WSGI responses to HTTP responses. When deploying your Flask application to production, the Flask documentation strongly recommends using a dedicated WSGI server rather than the built-in development server.WSGI servers have HTTP servers built-in. However, when serving your application with a WSGI server, it is good practice — and might be even necessary depending on the desired configuration — to put a dedicated HTTP server in front of it. This so-called “reverse proxy” handles incoming requests, TLS, and other security and performance concerns better than the WSGI server.The Flask documentation describes how to set up Nginx as a reverse proxy. The documentation also provides the following example configuration, but the example does not include TLS support.  server { listen 80; server_name _; location / { proxy_pass http://127.0.0.1:8000/; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-Prefix /; } } So, what do you have to do to enable TLS? In Nginx, TLS and, thus, HTTPS support is provided by a dedicated module called module ngx_http_ssl_module which itself relies on the well-known cryptography library OpenSSL.Here’s an example TLS configuration given in ngx_http_ssl_module documentation: worker_processes auto; http { ... server { listen 443 ssl; keepalive_timeout 70; ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; ssl_ciphers AES128-SHA:AES256-SHA:RC4-SHA:DES-CBC3-SHA:RC4-MD5; ssl_certificate /usr/local/nginx/conf/cert.pem; ssl_certificate_key /usr/local/nginx/conf/cert.key; ssl_session_cache shared:SSL:10m; ssl_session_timeout 10m; ... } But wait, what about insecure TLS configurations? How do you know that your setup will hold off the bad guys?To help you with this task, we wrote a small script which you can download at https://github.com/TLS-Port/TLSAudit. The script reads TLS configuration options in a given Nginx configuration file and prints a warning for any weak or insecure TLS options. We believe the script is helpful because Flask documentation refers also to older OpenSSL versions down to 1.0.2, which reached its official end of life by the end of 2019. These old OpenSSL versions contain algorithms with known cryptographic weaknesses. In the remainder of the article, we want to highlight three important TLS parameters that are checked by the script.TLS CiphersThe ssl_ciphers ciphers directive specifies the enabled ciphers in a format understood by the OpenSSL library, for example:ssl_ciphers ALL:!aNULL:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;The full list of available ciphers can be obtained using the openssl ciphers command. However, not all ciphers supported by OpenSSL are considered secure. Nginx documentation recommends the use of OpenSSL 1.0.2 or higher. Ciphers that should be avoided include:RC4: A stream cipher which is known to be vulnerable for a long time, irrespective of the key length. DES and 3DES: DES is very old cipher whose effective key length is way too short (56 Bit). The 3DES, or TripleDES applies the DES three times. It has a larger key space, but is quite ineffective.MD5-based Cipher Suites: MD5 is a broken hashing algorithm susceptible to collision attacks. Its use weakens the overall security of the whole cipher suite.Export-grade ciphers: Export-grade ciphers -- usually, they have an EXP in their name -- were intentionally weakened (to 40 or 56 bits) to comply with export restrictions from the 1990s. These ciphers are highly vulnerable to brute-force attacks.Early data in TLS 1.3The ssl_early_data directive enables or disables so-called early data in TLS 1.3. If two TLS endpoints share a secret key (this is called a pre-shared key, or PSK), TLS 1.3 allows them to communicate over a secure channel right from the start. This is referred to as zero round-trip time (0-RTT) mode which was added in TLS 1.3. It reduces TLS latency by allowing client Bob to send data to server Alice in the first round-trip, without waiting for Alice’s response. Bob uses the shared key to authenticate the server and to establish a secure channel for the early data which is simply added to the standard 1-RTT handshake. The downside is that 0-RTT data is less protected. First, forward secrecy does not hold for this data because it is encrypted using keys derived from the PSK rather than fresh, randomly generated key shares. This means that if the PSK gets stolen, earlier recorded TLS session can be decrypted by the attacker.Second, 0-RTT data is not protected against replay attacks, where legitimately encrypted and authenticated data are recorded by an attacker and replayed into the communication channel. Regular TLS data is protected against this type of attack by the server’s random variable. 0-RTT data, in contrast, does not depend on the ServerHello message and therefore lacks fresh randomness from the server. Enabling early data can, therefore, decrease TLS security. Elliptic CurvesThe ssl_ecdh_curve directive specifies one or more so-called elliptic curves used in Elliptic Curve Diffie Hellman ephemeral (ECDHE) key agreement. Unfortunately, there are some elliptic curves in OpenSSL 1.0.2 which are insecure according to today's standards:SECP192R1 (prime192v1 or P-192)SECP224R1 (P-224)SECP160R1 and SECP160K1Brainpool curves with a number less than 256TLS versionThe ssl_protocols [SSLv2] [SSLv3] [TLSv1] [TLSv1.1] [TLSv1.2] [TLSv1.3] directive specifies the TLS versions supported by the reverse proxy.  As already discussed, SSLv2 and SSLv3 versions contain serious known security weaknesses and must not be used. But also for TLS versions 1.0 and 1.1 there are known attacks that were shown to work in practice. So, only TLS versions 1.2 and 1.3 should be used (and the former with care).Naming: SSL or TLS?Let's get things straight. Practically everybody surfing the web today uses web addresses starting with https, which stands for Hypertext Transport Protocol Secure. The Secure part is realized by a cryptographic protocol called Transport Layer Security, or TLS for short. SSL, on the other hand, refers to initial TLS versions dating back to early 1990s.To address security needs of the upcoming e-commerce, Netscape Communications started designing a new protocol they named Secure Sockets Layer (SSL) for establishing a secure channel between a web server and a web browser. SSLv1, the first version of the SSL protocol, had severe security issues and Netscape never released its specification or implementation. In November 1994, Netscape publicly released SSLv2, the second version of the SSL protocol, and integrated into Netscape Navigator 1.1 in March 1995. Soon after its release, however, a number of further security weaknesses were discovered in SSLv2. Learning from these flaws, Netscape re-designed SSL from scratch and released SSLv3 in late 1995.In May 1996, the Internet Engineering Task Force (IETF) formed a working group to standardize an SSL-like protocol. The standardization process took around three years and the TLS 1.0 standard was published as RFC 2246 in January 1999. While TLS 1.0 closely resembled SSLv3, it had no backward compatibility to SSLv3. Almost seven years later, in April 2006, IETF published RFC 4346 defining the successor TLS version 1.1. Publication of RFC 5246 and with it a new version TLS 1.2 followed in 2008. Finally, in August 2018, IETF published RFC 8446 which specifies the newest TLS version, TLS 1.3.ConclusionSecuring your Flask application with TLS is essential for protecting sensitive data and building user trust. By following the best practices outlined in this article—such as configuring strong ciphers, enabling secure TLS versions, and leveraging tools like TLSAudit—you can ensure a robust, production-ready setup. For a deeper dive into the intricacies of TLS and advanced cryptographic concepts, consider exploring the book TLS Cryptography in Depth, by Dr. Paul Duplys, Dr. Roland Schmitz. TLS is the most important and widely used security protocol in the world. This book takes you on a journey through modern cryptography with TLS as the guiding light. It explains all necessary cryptographic primitives and how they are used within TLS. You’ll explore the inner workings of TLS and its design structure.Author BioDr. Paul Duplys is chief expert for cybersecurity at the department for technical strategies and enabling within the Mobility sector of Robert Bosch GmbH, a Tier-1 automotive supplier and manufacturer of industrial, residential, and consumer goods. Previous to this position, he spent over 12 years with Bosch Corporate Research, where he led the security and privacy research program and conducted applied research in various fields of information security. Paul's research interests include security automation, software security, security economics, software engineering, and AI. Paul holds a PhD degree in computer science from the University of Tuebingen, Germany.Dr. Roland Schmitz has been a professor of internet security at the Stuttgart Media University (HdM) since 2001. Prior to joining HdM, from 1995 to 2001, he worked as a research engineer at Deutsche Telekom, with a focus on mobile security and digital signature standardization. At HdM, Roland teaches courses on internet security, system security, security engineering, digital rights management, theoretical computer science, discrete mathematics, and game physics. He has published numerous scientific papers in the fields of internet and multimedia security. Moreover, he has authored and co-authored several books. Roland holds a PhD degree in mathematics from Technical University Braunschweig, Germany.
Read more
  • 6
  • 0
  • 78435

article-image-exploring-microservices-with-nodejs
Daniel Kapexhiu
22 Nov 2024
10 min read
Save for later

Exploring Microservices with Node.js

Daniel Kapexhiu
22 Nov 2024
10 min read
Introduction The world of software development is constantly evolving, and one of the most significant shifts in recent years has been the move from monolithic architectures to microservices. In his book "Building Microservices with Node.js: Explore Microservices Applications and Migrate from a Monolith Architecture to Microservices," Daniel Kapexhiu offers a comprehensive guide for developers who wish to understand and implement microservices using Node.js. This article delves into the book's key themes, including an analysis of Node.js as a technology, best practices for JavaScript in microservices, and the unique insights that Kapexhiu brings to the table. Node.js: The Backbone of Modern Microservices Node.js has gained immense popularity as a runtime environment for building scalable network applications, particularly in the realm of microservices. It is built on Chrome's V8 JavaScript engine and uses an event-driven, non-blocking I/O model, which makes it lightweight and efficient. These characteristics are essential when dealing with microservices, where performance and scalability are paramount. The author effectively highlights why Node.js is particularly suited for microservices architecture. First, its asynchronous nature allows microservices to handle multiple requests concurrently without being bogged down by long-running processes. This is crucial in a microservices environment where each service should be independently scalable and capable of handling a high load. Moreover, Node.js has a vast ecosystem of libraries and frameworks, such as Express.js and Koa.js, which simplifies the development of microservices. These tools provide a solid foundation for building RESTful APIs, which are often the backbone of microservices communication. The author emphasizes the importance of choosing the right tools within the Node.js ecosystem to ensure that microservices are not only performant but also maintainable and scalable. Best Practices for JavaScript in Microservices While Node.js provides a robust platform for building microservices, the importance of adhering to JavaScript best practices cannot be overstated. In his book, the author provides a thorough analysis of the best practices for JavaScript when working within a microservices architecture. These best practices are designed to ensure code quality, maintainability, and scalability. One of the core principles the author advocates is the use of modularity in code. JavaScript’s flexible and dynamic nature allows developers to break down applications into smaller, reusable modules. This modular approach aligns perfectly with the microservices architecture, where each service is a distinct, self-contained module. By adhering to this principle, developers can create microservices that are easier to maintain and evolve over time. The author also stresses the importance of following standard coding conventions and patterns. This includes using ES6/ES7 features such as arrow functions, destructuring, and async/await, which not only make the code more concise and readable but also improve its performance. Additionally, he underscores the need for rigorous testing, including unit tests, integration tests, and end-to-end tests, to ensure that each microservice behaves as expected. Another crucial aspect this book covers is error handling. In a microservices architecture, where multiple services interact with each other, robust error handling is essential to prevent cascading failures. The book provides practical examples of how to implement effective error-handling mechanisms in Node.js, ensuring that services can fail gracefully and recover quickly. Problem-Solving with Microservices Transitioning from a monolithic architecture to microservices is not without its challenges. The author does not shy away from discussing the potential pitfalls and complexities that developers might encounter during this transition. He offers practical advice on how to decompose a monolithic application into microservices, focusing on identifying the right boundaries between services and ensuring that they communicate efficiently. One of the key challenges in a microservices architecture is managing data consistency across services. The author addresses this issue by discussing different strategies for managing distributed data, such as event sourcing and the use of a centralized message broker. He provides examples of how to implement these strategies using Node.js, highlighting the trade-offs involved in each approach. Another common problem in microservices is handling cross-cutting concerns such as authentication, logging, and monitoring. The author suggests solutions that involve leveraging middleware and service mesh technologies to manage these concerns without introducing tight coupling between services. This allows developers to maintain the independence of each microservice while still addressing the broader needs of the application. Unique Insights and Experiences What sets this book apart is the depth of practical insights and real-world experiences that he shares. This book goes beyond the theoretical aspects of microservices and Node.js to provide concrete examples and case studies from his own experiences in the field. These insights are invaluable for developers who are embarking on their microservices journey. For instance, the author discusses the importance of cultural and organizational changes when adopting microservices. He explains how the shift to microservices often requires changes in team structure, development processes, and even the way developers think about code. By sharing his experiences with these challenges, the author helps readers anticipate and navigate the broader implications of adopting microservices. Moreover, the author offers guidance on the operational aspects of microservices, such as deploying, monitoring, and scaling microservices in production. He emphasizes the need for automation and continuous integration/continuous deployment (CI/CD) pipelines to manage the complexity of deploying multiple microservices. His advice is grounded in real-world scenarios, making it highly actionable for developers. Conclusion "Building Microservices with Node.js: Explore Microservices Applications and Migrate from a Monolith Architecture to Microservices" by Daniel Kapexhiu is an essential read for any developer looking to understand and implement microservices using Node.js. The book offers a comprehensive guide that covers both the technical and operational aspects of microservices, with a strong emphasis on best practices and real-world problem-solving. The author’s deep understanding of Node.js as a technology, combined with his practical insights and experiences, makes this book a valuable resource for anyone looking to build scalable, maintainable, and efficient microservices. Whether you are just starting your journey into microservices or are looking to refine your existing microservices architecture, this book provides the knowledge and tools you need to succeed. Author BioDaniel Kapexhiu is a software developer with over 6 years of working experience developing web applications using the latest technologies in frontend and backend development. Daniel has been studying and learning software development for about 12 years and has extended expertise in programming. He specializes in the JavaScript ecosystem, and is always updated about new releases of ECMAScript. He is ever eager to learn and master the new tools and paradigms of JavaScript.
Read more
  • 0
  • 0
  • 69160

article-image-revolutionising-work-and-everyday-life-with-chatgpt
M.T. White
16 Dec 2024
10 min read
Save for later

Revolutionising Work and Everyday Life with ChatGPT

M.T. White
16 Dec 2024
10 min read
IntroductionChatGPT is a revolutionary new technology that is making a large impact on society.  The full impact of ChatGPT cannot be fully known at the time of writing this article because of how novel the technology is.  However, what can be said is that since its introduction many industries have been trying to leverage it and increase productivity.  Simultaneously, everyday people are trying to learn to leverage it as well.  Overall, ChatGPT and similar systems are very new and the full impact of how to leverage them will take some more time to fully manifest.  This article is going to explore how ChatGPT can be used for everyday life by exploring a few use cases.What is ChatGPT?       Before we begin, it is important to understand what ChatGPT is and what it isn’t.  To begin ChatGPT is in a lay sense a super advanced chatbot.  More specifically, ChatGPT is known as a generative AI that uses Natural Language Processing (NLP) to create a dialog between a user and itself.  ChatGPT and similar systems are what are known as Large Language Models (LLMs).  In short, for AI models to work they have to be trained using data.  To train LLMs engineers use vast amounts such as books, articles, journals, and so on.  The result is a system like ChatGPT that has a vast knowledge base on many different subjects.  Before we can explore how to use ChatGPT for everyday life we need to explore how NOT to use ChatGPT. How not to use ChatGPT?ChatGPT is very powerful and can be used for many different things; however, is important to understand that ChatGPT is not a sage nor infallible.  Remember ChatGPT only knows about what it was trained on.  This means if the information it was taught was wrong or outdated so too will be the response it provides.  This means that as of writing this article ChatGPT cannot and should not be used as a replacement for humans.  Even with the answer ChatGPT gives, a decent level of domain knowledge is still required properly format and use its response.  As such, it is important to take suggestions with a certain amount of skepticism.  The best way to think of ChatGPT is as an extremely smart friend with a wide range of knowledge.  Though that friend may be very knowledgeable they can and will be wrong from time to time.  With that how does one use ChatGPT?How to use ChatGPT for everyday life?Using ChatGPT starts with what is called a prompt.  A prompt can be thought of as a query that the user feeds into ChatGPT.  This prompt is kind of like a Google search.  It will be a statement that asks ChatGPT a question or to perform a certain task.  Since ChatGPT used NLP, the prompt does not have to be formatted a certain way nor does it require a specific syntax.  In other words, a prompt can be entered the same way a user would pose the instruction to another person.  However, it is good to provide ChatGPT will enough information to adequately describe the problem.  An example, of a prompt is as follows:How can ChatGPT be used for everyday life?Once the prompt is run ChatGPT will produce text with suggestions, answers, programming software, or whatever was asked for.  That output that is produced is called a response which will have the answer to the request/question. There are a lot of ways ChatGPT can be used for everyday life.  In terms of professional life, ChatGPT can make an excellent assistant.  For most “text-based” jobs, that is jobs that don’t require design such as drawing diagrams, ChatGPT can be an excellent assistant.  For example, suppose a programmer needs a small piece of software for a project, assume they need a SQL query that can pull all the rows between two dates.  The user could use a prompt like the following:Write a SQL query that will pull all rows between January 3 and March 31.This prompt will produce the following response:To pull all rows between January 3 and March 31 from a table, you'll need to use the WHERE clause to filter the results based on the date column. Assuming you have a date column named "date_column" in your table, here's the SQL query: sqlCopy code SELECT * FROM your_table WHERE date_column >= '2023-01-03' AND date_column <= '2023-03-31'; Replace "your_table" with the actual name of your table, and "date_column" with the name of the column that contains the dates you want to filter. The query will select all rows where the "date_column" is on or after January 3, 2023, and on or before March 31, 2023.For those that are not familiar with SQL, the prompt produced a simple query with generic placeholders that will pull the correct data out of a database. Not everyone is a programmer, so how could a non-programmer use ChatGPT as an assistant?  One area in which ChatGPT can be used is by composing emails.  Suppose a salesperson needs to inform Tom, their boss, that a $1 million deal was closed today via email.  They could use a prompt like the following:Write Tom an email saying that the deal with Company X went through, and we made $1 million.The response would be something along the lines of:Subject: Successful Deal with Company X - $1 Million Revenue Achieved Hi Tom, I hope this email finds you well. I am thrilled to share some exciting news with you! The deal with Company X has successfully gone through, and I am delighted to inform you that we have made an astounding $1 million in revenue from this collaboration. This achievement wouldn't have been possible without the hard work and dedication of our entire team. It reflects the trust and confidence our partners have in our capabilities, and I believe it marks a significant milestone for our company's growth and success. I want to express my gratitude to you for your valuable contributions and expertise that played a crucial role in making this deal a reality. Your efforts have been commendable, and I am truly grateful for your commitment to the success of this venture. Let's take a moment to celebrate this triumph together. Your exceptional skills and commitment to excellence have made this accomplishment possible, and I am genuinely proud to have you as part of our team. Looking forward to even more prosperous ventures in the future. Once again, congratulations to you and the entire team! Best regards, [Your Name] [Your Title/Position] [Company Name] [Email] [Phone Number]In this case, all the users would have to do is proofread the email, copy it, and send it.  Now, for applications such as this proofreading is very important.  As stated before, ChatGPT is not a mind reader nor a sage, as such it may produce emails with wrong information, illogical wording, or anything else.  So, it’ll pay to read the message before sending it. Learning ToolA very powerful use of ChatGPT is as a learning tool.  For example, students can use this tool to help them master a new subject, generate practice problems, and even check answers to an extent.  Assume the user needs a few extra practice problems for their English homework, they could enter a prompt like the following:Generate some grammar practice problems for a 1st year English classWhich would yield the following output:Of course! Here are some grammar practice problems suitable for a 1st-year English class: 1. Subject-Verb Agreement: Choose the correct verb form that agrees with the subject in each sentence: a) The cat (purr/purrs) when it is happy. b) My friend and I (is/are) going to the movies tonight. c) Neither the students nor the teacher (understands/understand) the assignment. 2. Pronoun Usage: Select the appropriate pronoun to complete each sentence: a) She invited Sarah and (I/me) to the party. b) My sister and (her/she) are going shopping. c) The book is for (he/him) and his brother. 3. Articles (a/an/the): Insert the correct article (a, an, the, or no article) in each sentence: a) Can you pass me (a/an) apple, please? b) I need to buy (the) new shoes for the dance. c) My friend has (no article) car. 4. Prepositions: Choose the correct preposition to complete each sentence: a) We arrived (in/on/at) the airport fifteen minutes early. b) The cat is sleeping (under/below/above) the table. c) She's going to the store (by/with/for) her mom. 5. Verb Tenses: Identify the correct verb tense used in each sentence: a) They (will be/was) studying for the exam all night. b) By the time we arrived, the party (had started/has started). c) I (am/was) watching a movie right now. 6. Adjective vs. Adverb: Decide whether the word in parentheses should be an adjective or adverb in the sentence: a) She sings (beautiful/beautifully). b) The (happy/happily) children played in the park. c) The (quick/quickly) fox jumped over the fence. 7. Negation: Add the necessary negative word to each sentence: a) I have (no/not) seen him today. b) She does (not/none) like coffee. c) We will (not/none) be attending the party. 8. Passive Voice: Rewrite the following sentences in the passive voice: a) The chef cooked the meal. b) The teacher is explaining the lesson. c) The storm destroyed the old building. 9. Comparative and Superlative Forms: Complete each sentence with the correct comparative or superlative form of the adjective: a) This book is (interesting/more interesting/most interesting) than the last one. b) She is (tall/taller/tallest) than her sister. c) It was the (big/biggest/bigger) cake at the party. 10. Reported Speech: Rewrite the following sentences using reported speech: a) Mary said, "I am going to the store." b) "I will call you later," said John. c) "They have arrived," she said. These grammar practice problems aim to reinforce essential grammar rules and concepts, helping 1st-year English students develop their language skills and proficiency.As can be seen, ChatGPT generated a number of questions that one would usually find in an English 101 class.  Now, if the user wanted a more specific answer that could reword the prompt to accommodate. ChatGPT is also a great teaching tool as well.  Teachers can also use this tool to generate lesson plans, write intros for learning modules, generate problems, and more.  Overall, ChatGPT can be used as an excellent tool for academia in general.Now, it is important to remember that ChatGPT is not all-knowing.  It can only generate responses based on the data that it was trained on.  This means if the user is asking questions about a subject where there is little or no publicly available data the response probably isn’t going to be of the highest quality.  For example, if someone askedHow do I build a working time machine?They’re going to get the following:Building a working time machine, as depicted in science fiction, is currently beyond our technological capabilities and goes against the known laws of physics. Time travel, as popularly imagined, involves traveling backward or forward in time, similar to how we move through space. As of my last update in September 2021, time travel remains a concept explored mainly in theoretical physics, and no practical time machine has been built or proven to be possible. Several theories, such as the theory of relativity, suggest that time travel could be theoretically feasible under certain conditions, but these ideas are still purely theoretical and have not been put into practice. It's important to remember that building a time machine would present enormous scientific, technological, and ethical challenges. It could raise paradoxes, such as the "grandfather paradox," where a time traveler could potentially alter past events and create inconsistencies in history. As of now, time travel remains a fascinating topic in science fiction and theoretical physics, but it is not something that can be practically accomplished with our current understanding and technology. The scientific community continues to explore the nature of time and the universe, but building a working time machine as portrayed in movies and books is not yet a reality.ChatGPT gave some general information on time travel but unfortunately, it didn’t generate any schematics.  Again, this goes back to ChatGPT not being a sage as it only knows what it was “taught”.  Since there is no data on how to build a working time machine ChatGPT could not fully answer the prompt. Now, If one looks at the English question prompt above, one will notice that practice questions were quite broad.  It is common to have to “dig” with ChatGPT.  In other words, a person may have to refine the queriers to get what they need.  To get some practice try re-wording the prompt to generate practice questions for only passive writing.  SummaryChatGPT is a tool, and like any other tool, what it’s used for is up to the user.  As was seen in this article, ChatGPT is an excellent tool for helping a person through their day by generating software, emails, and so on.  ChatGPT can also be used as a great learning or teaching device to help students and teachers generate practice problems, create lesson plans, and so much more.  However, as was stated so many numerous times.  Unless ChatGPT has been trained on something it does not know about it.  This means that asking it things like how to build a time machine or domain specific concepts aren’t going to return quality responses.  Also, even if ChatGPT has been trained on the prompt, it may not always generate a quality response.  No matter the use case, the response should be vetted for accuracy.  This may mean doing a little extra research with the response given, testing the output, or whatever needs to be done to verify the response. Overall, ChatGPT at the time of writing this article is less than a year old.  This means that the full implication of using ChatGPT are not fully understood.  Also, how to fully leverage ChatGPT is not understood yet either.  What can be said is that ChatGPT and similar LLM systems will probably be the next Google.  In terms of everyday use, the only true inhibitors are the user's imagination and the data that was used to train ChatGPT.Author BioM.T. White has been programming since the age of 12. His fascination with robotics flourished when he was a child programming microcontrollers such as Arduino. M.T. currently holds an undergraduate degree in mathematics, and a master's degree in software engineering, and is currently working on an MBA in IT project management. M.T. is currently working as a software developer for a major US defense contractor and is an adjunct CIS instructor at ECPI University. His background mostly stems from the automation industry where he programmed PLCs and HMIs for many different types of applications. M.T. has programmed many different brands of PLCs over the years and has developed HMIs using many different tools.Author of the book: Mastering PLC Programming
Read more
  • 0
  • 0
  • 66267

article-image-the-complete-guide-to-nlp-foundations-techniques-and-large-language-models
Lior Gazit, Meysam Ghaffari
13 Nov 2024
10 min read
Save for later

The Complete Guide to NLP: Foundations, Techniques, and Large Language Models

Lior Gazit, Meysam Ghaffari
13 Nov 2024
10 min read
Introduction In the rapidly evolving field of Natural Language Processing (NLP), staying ahead of technological advancements while mastering foundational principles is crucial for professionals aiming to drive innovation. "Mastering NLP from Foundations to LLMs" by Packt Publishing serves as a comprehensive guide for those seeking to deepen their expertise. Authored by leading figures in Machine Learning and NLP, this text bridges the gap between theoretical knowledge and practical applications. From understanding the mathematical underpinnings to implementing sophisticated NLP models, this book equips readers with the skills necessary to solve today’s complex challenges. With insights into Large Language Models (LLMs) and emerging trends, it is an essential resource for both aspiring and seasoned NLP practitioners, providing the tools needed to excel in the data-driven world of AI. In-Depth Analysis of Technology NLP is at the forefront of technological innovation, transforming how machines interpret, generate, and interact with human language. Its significance spans multiple industries, including healthcare, finance, and customer service. At the core of NLP lies a robust integration of foundational techniques such as linear algebra, statistics, and Machine Learning. Linear algebra is fundamental in converting textual data into numerical representations, such as word embeddings. Statistics play a key role in understanding data distributions and applying probabilistic models to infer meaning from text. Machine Learning algorithms, like decision trees, support vector machines, and neural networks, are utilized to recognize patterns and make predictions from text data. "Mastering NLP from Foundations to LLMs" delves into these principles, providing extensive coverage on how they underpin complex NLP tasks. For example, text classification leverages Machine Learning to categorize documents, enhancing functionalities like spam detection and content organization. Sentiment analysis uses statistical models to gauge user opinions, helping businesses understand consumer feedback. Chatbots combine these techniques to generate human-like responses, improving user interaction. By meticulously elucidating these technologies, the book highlights their practical applications, demonstrating how foundational knowledge translates to solving real-world problems. This seamless integration of theory and practice makes it an indispensable resource for modern tech professionals seeking to master NLP. Adjacent Topics The realm of NLP is witnessing groundbreaking advancements, particularly in LLMs and hybrid learning paradigms that integrate multimodal data for richer contextual understanding. These innovations are setting new benchmarks in text understanding and generation, driving enhanced applications in areas like automated customer service and real-time translation. "Mastering NLP from Foundations to LLMs" emphasizes best practices in text preprocessing, such as data cleaning, normalization, and tokenization, which are crucial for improving model performance. Ensuring robustness and fairness in NLP models involves techniques like resampling, weighted loss functions, and bias mitigation strategies to address inherent data disparities. The book also looks ahead at future directions in NLP, as predicted by industry experts. These include the rise of AI-driven organizational structures where decentralized AI work is balanced with centralized data governance. Additionally, there is a growing shift towards smaller, more efficient models that maintain high performance with reduced computational resources. "Mastering NLP from Foundations to LLMs" encapsulates these insights, offering a forward-looking perspective on NLP and providing readers with a roadmap to stay ahead in this rapidly advancing field. Problem-Solving with Technology "Mastering NLP from Foundations to LLMs" addresses several critical issues in NLP through innovative methodologies. The book first presents common workflows with LLMs such as prompting via APIs and building a Langchain pipeline. From there, the book takes on heavier challenges. One significant challenge is managing multiple models and optimizing their performance for specific tasks. The book introduces the concept of using multiple LLMs in parallel, with each model specialized for a particular function, such as a medical domain or backend development in Python. This approach reduces overall model size and increases efficiency by leveraging specialized models rather than a single, monolithic one. Another issue is optimizing resource allocation. The book discusses strategies like prompt compression for cost reduction, which involves compacting input prompts to minimize token count without sacrificing performance. This technique addresses the high costs associated with large-scale model deployments, offering businesses a cost-effective way to implement NLP solutions. Additionally, the book explores fault-tolerant multi-agent systems using frameworks like Microsoft’s AutoGen. By assigning specific roles to different LLMs, these systems can work together to accomplish complex tasks, such as professional-level code generation and error checking. This method enhances the reliability and robustness of AI-assisted solutions. Through these problem-solving capabilities, "Mastering NLP from Foundations to LLMs" provides practical solutions that make advanced technologies more accessible and efficient for real-world applications. Unique Insights and Experiences Chapter 11 of "Mastering NLP from Foundations to LLMs" offers a wealth of expert insights that illuminate the future of NLP. Contributions from industry leaders like Xavier Amatriain (VP, Google) and Nitzan Mekel-Bobrov (CAIO, Ebay) explore hybrid learning paradigms and AI integration into organizational structures, shedding light on emerging trends and practical applications. The authors, Lior Gazit and Meysam Ghaffari, share their personal experiences of implementing NLP technologies in diverse sectors, ranging from finance to healthcare. Their journey underscores the importance of a solid foundation in mathematical and statistical principles, combined with innovative problem-solving approaches. This book empowers readers to tackle advanced NLP challenges by providing comprehensive techniques and actionable advice. From addressing class imbalances to enhancing model robustness and fairness, the authors equip practitioners with the skills needed to develop robust NLP solutions, ensuring that readers are well-prepared to push the boundaries of what’s possible in the field. Conclusion "Mastering NLP from Foundations to LLMs" is an 11-course meal that offers a comprehensive journey through the intricate landscape of NLP. It serves as both a foundational text and an advanced guide, making it invaluable for beginners seeking to establish a solid grounding and experienced practitioners aiming to deepen their expertise. Covering everything from basic mathematical principles to advanced NLP applications like LLMs, the book stands out as an essential resource. Throughout its chapters, readers gain insights into practical problem-solving strategies, best practices in text preprocessing, and emerging trends predicted by industry experts. "Mastering NLP from Foundations to LLMs" equips readers with the skills needed to tackle advanced NLP challenges, making it a comprehensive, indispensable guide for anyone looking to master the evolving field of NLP. For detailed guidance and expert advice, dive into this book and unlock the full potential of NLP techniques and applications in your projects. Author BioLior Gazit is a highly skilled Machine Learning professional with a proven track record of success in building and leading teams drive business growth. He is an expert in Natural Language Processing and has successfully developed innovative Machine Learning pipelines and products. He holds a Master degree and has published in peer-reviewed journals and conferences. As a Senior Director of the Machine Learning group in the Financial sector, and a Principal Machine Learning Advisor at an emerging startup, Lior is a respected leader in the industry, with a wealth of knowledge and experience to share. With much passion and inspiration, Lior is dedicated to using Machine Learning to drive positive change and growth in his organizations.Meysam Ghaffari is a Senior Data Scientist with a strong background in Natural Language Processing and Deep Learning. Currently working at MSKCC, where he specialize in developing and improving Machine Learning and NLP models for healthcare problems. He has over 9 years of experience in Machine Learning and over 4 years of experience in NLP and Deep Learning. He received his Ph.D. in Computer Science from Florida State University, His MS in Computer Science - Artificial Intelligence from Isfahan University of Technology and his B.S. in Computer Science at Iran University of Science and Technology. He also worked as a post doctoral research associate at University of Wisconsin-Madison before joining MSKCC.
Read more
  • 0
  • 0
  • 66078

article-image-automate-your-microsoft-intune-tasks-with-graph-api
Andrew Taylor
19 Nov 2024
10 min read
Save for later

Automate Your Microsoft Intune Tasks with Graph API

Andrew Taylor
19 Nov 2024
10 min read
Why now is the time to start your automating journey with Intune and GraphWith more and more organizations moving to Microsoft Intune and IT departments under constant strain, automating your regular tasks is an excellent way to free up time to concentrate on your many other important tasks.When dealing with Microsoft Intune and the related Microsoft Entra tasks, everything clicked within the portal UI is sending API requests to Microsoft Graph which sits underneath everything and controls exactly what is happening and where.Fortunately, Microsoft Graph is a public API, therefore anything being carried out within the portal, can be scripted and automated.Imagine a world where by using automation, you log in to your machine in the morning, and there waiting for you is an email, or Teams message that ran overnight containing everything you need to know about your environment.  You can take this information to quickly resolve any new issues and be extra proactive with your user base, calling them to resolve issues before they have even noticed themselves.This is just the tip of the iceberg of what can be done with Microsoft Graph, the possibilities are endless.Microsoft Graph is a web API that can be accessed and manipulated via most programming and scripting languages, so if you have a preferred language, you can get started extremely quickly.  For those starting out, PowerShell is an excellent choice as the Microsoft Graph SDK includes modules that take the effort out of the initial connection and help write the requests.For those more experienced, switching to the C# SDK opens up more scalability and quicker performance, but ultimately it is the same web requests underneath so once you have the basic knowledge of the API, moving these skills between languages is much easier.When looking to learn the API, an excellent starting point is to use the F12 browser tools and select Network.  Then click around in the portal and have a look at the network traffic.This will be in the form of GET, POST, PUT, DELETE, and BATCH requests depending on what action is being performed.  Get is used to retrieve information and is one-way traffic, retrieving from Graph and returning to the client.POST and PUT are used to send data to Graph.DELETE is fairly self-explanatory and is used to delete records.BATCH is used to increase performance in more complex tasks.  This groups multiple Graph API calls into one command which reduces the calls and improves the performance.  It works extremely well, but starting with the more basic commands is always recommended.Once you have mastered Graph calls from a local device with interactive authentication, the next step is to create Entra App Registrations and run locally, but with non-interactive authentication.This will feed into true automation where tasks can be set to run without any user involvement, at this point learning about Azure Automation accounts and Azure Function and Logic Apps will prove incredibly useful.For larger environments, you can take it a step further and use Azure DevOps pipelines to trigger tasks and even implement approval processes.Some real-world examples of automation with Graph include new environment configuration, policy management, and application management, right through to documenting and monitoring policies. Once you have the basic knowledge of Graph API and PowerShell, it is simply a case of slotting them together and watching where the process takes you.  The learning never stops, before you know it you will be creating tools for your other IT staff to use to quickly retrieve passwords on the go, or do standard tasks without needing elevated privileges.Now, I know what you are thinking, this all sounds fantastic and exactly what I need, but how do I get started and how do I find the time to learn a new skill like this?We can start with time management.  I am sure throughout your career you have had to learn new software, systems, and technologies without any formal training and the best way to do that is by learning by doing.  The same can apply here, when you are completing your normal tasks, simply have the F12 network tools open and have a quick look at the URLs and requests being sent.If you can, try and find a few minutes per day to do so some practice scripts, ideally in a development environment, but if not, start with GET requests which cannot do any damage.To take it further and learn more about PowerShell, Graph, and Intune, check out my “Microsoft Intune Cookbook” which runs through creating a tenant from scratch, both in the portal and via Graph, including code samples for everything possible within the Intune portal.  You can use these samples to expand upon and meet your needs while learning about both Intune and Graph.Author BioAndrew Taylor is an End-User Compute architect with 20 years IT experience across industries and a particular interest in Microsoft Cloud technologies, PowerShell and Microsoft Graph. Andrew graduated with a degree in Business Studies in 2004 from Lancaster University and since then has obtained numerous Microsoft certifications including Microsoft 365 Enterprise Administrator Expert, Azure Solutions Architect Expert and Cybersecurity Architect Expert amongst others. He currently working as an EUC Architect for an IT Company in the United Kingdom, planning and automating the products across the EUC space. Andrew lives on the coast in the North East of England with his wife and two daughters.
Read more
  • 0
  • 0
  • 61305

article-image-artificial-intelligence-in-game-development-understanding-behavior-trees
Marco Secchi
18 Nov 2024
10 min read
Save for later

Artificial Intelligence in Game Development: Understanding Behavior Trees

Marco Secchi
18 Nov 2024
10 min read
IntroductionIn the wild world of videogames, you'll inevitably encounter a foe that needs to be both engaging and captivating. This opponent isn't just a bunch of nice-to-see polygons and textures; it needs to be a challenge that'll keep your players hooked to the screen.Let's be honest, as a game developer, crafting a truly engaging opponent is often a challenge that rivals the one your players will face!In video games, we often use the term Artificial Intelligence (AI) to describe characters that are not controlled by the player, whether they are enemies or friendly entities. There are countless ways to develop compelling characters in video games. In this article, we'll explore one specific solution offered by Unreal Engine: behavior trees.NoteCitations come from my Artificial Intelligence in Unreal Engine 5 book.Using the Unreal Shooting Gym ProjectFor this article, I have created a dedicated project called Unreal Shooting Gym. You can freely download it from GitHub: https://github.com/marcosecchi/unreal-shooting-gym and open it up with Unreal Engine 5.4.Once opened, you should see a level showing a lab with a set of targets and a small robot armed with a gun (A.K.A. RoboGun), as shown in Figure 1: Figure 1. The project level.If you hit the Play button, you should notice the RoboGun rotating toward a target while shooting. Once the target has been hit, the RoboGun will start rotating towards another one. All this logic has been achieved through a behavior tree, so let’s see what it is all about.Behavior Trees“In the universe of game development, behavior trees are hierarchical structures that govern the decision-making processes of AI characters, determining their actions and reactions during gameplay.”Unreal Engine offers a solid framework for handling behavior trees based on two main elements: the blackboard and behavior tree assets.Blackboard Asset“In Unreal Engine, the Blackboard [...] acts as a memory space – some sort of brain – where AI agents can read and write data during their decision-making process.“By opening the AI project folder, you can double-click the BB_Robogun asset to open it. You will be presented with the blackboard that, as you can see from Figure 2, is quite simple to understand. Figure 2. The AI blackboardAs you can see there’s a couple of variables – called keys – that are used to store a reference to the actor owning the behavior tree – in this case, the RoboGun – and to the target object that will be used to rotate the RoboGun.Behavior Tree Asset“In Unreal Engine, behavior trees are assets that are edited in a similar way to Blueprints – that is, visually – by adding and linking a set of nodes with specific functionalities to form a behavior tree graph.”Now, double-click the BT_RoboGun asset located in the AI folder to open the behavior tree. You should see the tree structure depicted in Figure 3:Figure 3. The AI behavior treeAlthough this is a pretty simple behavior logic, there’s a lot of things involved here. First of all, you will notice that there is a Root node; this is where the behavior logic starts from.After that, you will see that there are three gray-colored nodes; these are defined composite nodes.“Composite nodes define the root of a branch and set the rules for its execution.”Each of them behaves differently, but it is sufficient to say that they control the subtree that will be executed; as an example, the Shoot Sequence node will execute all the subtrees one after the other.The purple-colored nodes are called tasks and they are basically the leaves of the tree, whose aim is to execute actions. Unreal Engine comes with some predefined tasks, but you will be able to create your own through Blueprints or C++.As an example, consider the Shoot task depicted in Figure 4: Figure 4. The Shoot task In this Blueprint, when the task is executed, it will call the Shoot method – by means of a ShootInterface – and then end the execution with success. For a slightly more complex task, please check the  BTTask_SeekTarget asset.Get back to the behavior tree, and you will notice that the Find Random Target node has a blue-colored section called Is Target Set? This is a decorator. “Decorators provide a way to add additional functionality or conditions to the execution of a portion of a behavior tree.”In our case, the decorator is checking if the TargetActor blackboard key is not set; the corresponding task will be executed only if that key is not set – that is, we have no viable target. If the target is set, this decorator will block task execution and the parent selector node – the Root Selector node – will execute the next subtree.Environment QueriesUnreal Engine provides an Environment Query System (EQS) framework that allows data collection about the virtual environment. AI agents will be able to make informed decisions based on the results.In our behavior tree, we are running an environment query to find a viable target in the Find Random Target task. The query I have created – called EQ_FindTarget – is pretty simple as it just queries the environment looking for instances of the class BP_Target, as shown in Figure 5:Figure 5. The environment queryPawn and ControllerOnce you have created your behavior tree, you will need to execute it through an AIController, the class that is used to possess pawns or characters in order to make them proper AI agents. In the Blueprints folder, you can double-click on the RoboGunController asset to check the pretty self-explanatory code depicted in Figure 6:Figure 6. The character controller codeAs you can see, it’s just a matter of running a behavior tree asset. Easy, isn’t it?If you open the BP_RoboGun asset, you will notice that, in the Details panel, I have set the AI Controller Class to the RoboGunController; this will make the RoboGun pawn be automatically possessed by the RoboGunController.ConclusionThis concludes this brief overview of the behavior tree system; I encourage you to explore the possibilities and more advanced features – such as writing your code the C++ way – by reading my new book “Artificial Intelligence in Unreal Engine 5”; I promise you it will be an informative and, sometimes, funny journey!Author BioMarco Secchi is a freelance game programmer who graduated in Computer Engineering at the Polytechnic University of Milan. He is currently lecturer of the BA in Creative Technologies and of the MA in Creative Media Production. He also mentors BA students in their final thesis projects. In his spare time, he reads (a lot), plays (less than he would like) and practices (to some extent) Crossfit.
Read more
  • 0
  • 0
  • 60598
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-understanding-memory-allocation-and-deallocation-in-the-net-common-language-runtime-clr
Trevoir Williams
29 Oct 2024
10 min read
Save for later

Understanding Memory Allocation and Deallocation in the .NET Common Language Runtime (CLR)

Trevoir Williams
29 Oct 2024
10 min read
IntroductionThis article provides an in-depth exploration of memory allocation and deallocation in the .NET Common Language Runtime (CLR), covering essential concepts and mechanisms that every .NET developer should understand for optimal application performance. Starting with the fundamentals of stack and heap memory allocation, we delve into how the CLR manages different types of data and the roles these areas play in memory efficiency. We also examine the CLR’s generational garbage collection model, which is designed to handle short-lived and long-lived objects efficiently, minimizing resource waste and reducing memory fragmentation. To help developers apply these concepts practically, the article includes best practices for memory management, such as optimizing object creation, managing unmanaged resources with IDisposable, and leveraging profiling tools. This knowledge equips developers to write .NET applications that are not only memory-efficient but also maintainable and scalable.Understanding Memory Allocation and Deallocation in the .NET Common Language Runtime (CLR) Memory management is a cornerstone of software development, and in the .NET ecosystem, the Common Language Runtime (CLR) plays a pivotal role in how memory is allocated and deallocated. The CLR abstracts much of the complexity involved in memory management, enabling developers to focus more on building applications than managing resources.  Understanding how memory allocation and deallocation work under the hood can help you write more efficient and performant .NET applications. Memory Allocation in the CLR When you create objects in a .NET application, the CLR allocates memory. This process involves several key components, including the stack, heap, and garbage collector. In .NET, memory is allocated in two main areas: the stack and the heap. Stack Allocation: The stack is a Last-In-First-Out (LIFO) data structure for storing value types and method calls. Variables stored on the stack are automatically managed, meaning that when a method exits, all its local variables are popped off the stack, and the memory is reclaimed. This process is very efficient because the stack operates linearly and predictably. Heap Allocation: On the other hand, the heap is used for reference types (such as objects and arrays). Memory on the heap is allocated dynamically, meaning that the size and lifespan of objects are not known until runtime. When you create a new object, memory is allocated on the heap, and a reference to that memory is returned to the stack where the reference type variable is stored. When a .NET application starts, the CLR reserves a contiguous block of memory called the managed heap. This is where all reference-type objects are stored. The managed heap is divided into three generations (0, 1, and 2), which are part of the Garbage Collector (GC) strategy to optimize memory management: Generation 0: Short-lived objects are initially allocated here. This is typically where small and temporary objects reside. Generation 1: Acts as a buffer between short-lived and long-lived objects. Objects that survive a garbage collection in Generation 0 are promoted to Generation 1. Generation 2: Long-lived objects like static data reside here. Objects that survive multiple garbage collections are eventually moved to this generation. When a new object is created, the CLR checks the available space in Generation 0 and allocates memory for the object. If Generation 0 is full, the GC is triggered to reclaim memory by removing objects that are no longer in use. Memory Deallocation and Garbage Collection The CLR’s garbage collector is responsible for reclaiming memory by removing inaccessible objects in the application. Unlike manual memory management, where developers must explicitly free memory, the CLR automatically manages this through garbage collection, which simplifies memory management but requires an understanding of how and when this process occurs. Garbage collection in the CLR involves three main steps: Marking: The GC identifies all objects still in use by following references from the root objects (such as global and static references, local variables, and CPU registers). Any objects not reachable from these roots are considered garbage. Relocating: The GC then updates the references to the surviving objects to ensure that they point to the correct locations after compacting memory. Compacting: The memory occupied by the unreachable (garbage) objects is reclaimed, and the remaining objects are moved closer together in memory. This compaction step reduces fragmentation and makes future memory allocations more efficient. The CLR uses the generational approach to garbage collection in .NET, designed to optimize performance by reducing the amount of memory that needs to be examined and reclaimed.  Generation 0 collections occur frequently but are fast because most objects in this generation are short-lived and can be quickly reclaimed. Generation 1 collections are less frequent but handle objects that have survived at least one garbage collection. Generation 2 collections are the most comprehensive and involve long-lived objects that have survived multiple collections. These collections are slower and more resource-intensive. Best Practices for Managing Memory in .NET Understanding how the CLR handles memory allocation and deallocation can guide you in writing more efficient code. Here are a few best practices: Minimize the Creation of Large Objects: Large objects (greater than 85,000 bytes) are allocated in a special section of the heap called the Large Object Heap (LOH), which is not compacted due to the overhead associated with moving large blocks of memory. Large objects should be used judiciously because they are expensive to allocate and manage.  Use `IDisposable` and `using` Statements: Implementing the `IDisposable` interface and using `using` statements ensures that unmanaged resources are released promptly. Profile Your Applications: Regularly use profiling tools to monitor memory usage and identify potential memory leaks or inefficiencies. Conclusion Mastering memory management in .NET is essential for building high-performance, reliable applications. By understanding the intricacies of the CLR, garbage collection, and best practices in memory management, you can optimize your applications to run more efficiently and avoid common pitfalls like memory leaks and fragmentation. Effective .NET Memory Management, written by Trevoir Williams, is your essential guide to mastering the complexities of memory management in .NET programming. This comprehensive resource equates developers with the tools and techniques to build memory-efficient, high-performance applications.  The book delves into fundamental concepts like: Memory Allocation and Garbage Collection Memory profiling and Optimization Strategies  Low-level programming with Unsafe Code Through practical examples and best practices, you’ll learn how to prevent memory leaks, optimize resource usage, and enhance application scalability. Whether you’re developing desktop, web, or cloud-based applications, this book provides the insights you need to manage memory effectively and ensure your .NET applications run smoothly and efficiently. Author BioTrevoir Williams, a passionate software and system engineer from Jamaica, shares his extensive knowledge with students worldwide. Holding a Master&rsquo;s degree in Computer Science with a focus on Software Development and multiple Microsoft Azure Certifications, his educational background is robust. His diverse experience includes software consulting, engineering, database development, cloud systems, server administration, and lecturing, reflecting his commitment to technological excellence and education. He is also a talented musician, showcasing his versatility. He has penned works like Microservices Design Patterns in .NET and Azure Integration Guide for Business. His practical approach to teaching helps students grasp both theory and real-world applications.
Read more
  • 0
  • 0
  • 59693

article-image-getting-started-with-test-driven-development-in-angular
Ezéchiel Amen AGBLA
09 Dec 2024
10 min read
Save for later

Getting Started with Test-Driven Development in Angular

Ezéchiel Amen AGBLA
09 Dec 2024
10 min read
IntroductionTest-Driven Development (TDD) is a software development process widely used inAngular development to ensure code quality and reduce the time spent debugging. TDD is an agile approach to software development that emphasizes the iterative writing of tests before writing the actual code.In this article, we'll take a look at how to serenely build an application based on the Test-Driven Development approach.We'll start with a brief look at the process of developing an application without TDD, then use the Test-First approach to better understand the benefits of TDD, and finish with a case study on cart processing in an e-commerce application, for example.Developing an application without using Test-Driven Development? When we generally develop an application without using TDD, this is how we proceed:First, we write the source code;Next, we write the test corresponding to the source code;And finally, we refactor our code.As we evolve, we find it hard to detect and manage bugs in our application, because we always manage to write the tests to satisfy our code. As a result, we don't manage all the test cases we need to, and our application is very sensitive to the slightest modification. It's very easy to get regressions.In the next section, we'll look at how to develop an application using the Test-First approach.Developing a Test-First application? Developing an application using the Test-First approach means writing the tests before writing the source code, as the name indicates. Here's how we do it:Write the exhaustive tests related to the feature we want to implement;Then write the code associated with the tests;Finally, we refactor the code.Using this approach, we are constrained to have an exhaustive vision of the features to be developed on our product, which means we can't evolve in an agile context. If tomorrow's features need to evolve or be completely modified, we'll have a very difficult time updating our source code. Also, we'll be very slow in developing features, because we'll have to write all the possible test cases before we start writing the associated code.Now that we've seen these two approaches, in the next session we'll move on to the main topic of our article, the Test-Driven Development approach.Getting started with Test-Driven Development methodologyTest-Driven Development is a dynamic software development methodology that prioritizes the incremental creation of tests as minimalist code is implemented, while conforming to the User Story. In this section, we’ll explore the role of TDD in application development, then we’ll discover the core principle of TDD and the tools needed to successfully implement this approach.What role does TDD play in application development?TDD plays a crucial role in improving software quality, reducing bugs, and fostering better collaboration between developers and QA teams throughout the development process. We can summarize its role in three major points:Quality: TDD guarantees high test coverage, so that bugs can be detected quickly and high-quality code maintained.Design: By writing tests upstream, we think about the architecture of our code and obtain a clearer, more maintainable design.Agility: TDD favors incremental and iterative development, making it easier to adapt to changing needs.In this section, we’ll discover the core principle of TDD.What is the core principle of TDD?The TDD approach is based on a cycle known as the red-green-refactor cycle, which forms the core of this approach. Here's how it works:Write a test before coding: We start by defining a small piece of functionality and writing a test that will fail until that functionality is implemented. We don’t need to write the exhaustive tests related to the feature.Just the right minimalist code: We then write the minimum code necessary for the test to pass.Refactor: Once the test has been passed, the code is improved without changing its behavior.In the next section, we’ll learn the tools needed to successfully implement this approachTools for Test-Driven DevelopmentCase study: << Managing a shopping cart on a sales site. >>In this case study, we’ll implement some features about management of a shopping cart. The goal is very simple. Given a developer, your task is to develop the shopping cart of a sales site using a Test-Driven Development approach. We’ll do it step-by-step following the principles of TDD.When we’re asked to implement the shopping cart on an e-commerce application, we’ve to stay focused on the application and ask ourselves the right questions to find the most basic scenario when a customer lands on the shopping cart page. In this scenario, the shopping cart is empty by default when a customer launches your application. So, we’ll follow these steps about our first-ever scenario:Write the test on the assumption that the customer's shopping cart is empty at initialization (the most basic scenario):Write the minimal code associated with the test:Refactor the code if necessary:We don’t need a refactoring in our case.Now, we can continue with our second scenario. Once we’re satisfied that the basket is empty when the application is launched, we’ll check that the total cart price is 0. We’ll follow the principles of TDD and stay iterative on our code.Write the test to calculate the total basket price when the basket is empty:Write the minimal code associated with the test:Refactor the code if necessary.ConclusionTest-Driven Development (TDD) is a powerful methodology that enhances code quality, fosters maintainable design, and promotes agility in software development. By focusing on writing tests before code and iterating through the red-green-refactor cycle, developers can create robust, bug-resistant applications. In this article, we’ve explored the differences between traditional development approaches and TDD, delved into its core principles, and applied the methodology in a practical e-commerce case study.For a deeper dive into mastering TDD in Angular, consider reading Mastering Angular Test-Driven Development by Ezéchiel Amen AGBLA. With this book, you’ll learn the fundamentals of TDD and discover end-to-end testing using Protractor, Cypress, and Playwright. Moreover, you’ll improve your development process with Angular TDD best practices.Author BioEzéchiel Amen AGBLA is passionate about web and mobile development, with expertise in Angular and Ionic. A certified Ionic and Angular expert developer, he shares his knowledge through courses, articles, webinars, and training. He uses his free time to help beginners and spread knowledge by writing articles, including via Openclassrooms, the leading online university in the French-speaking world. He also helps fellow developers on a part-time basis with code reviews, technical choices, and bug fixes. His dedication to continuous learning and experience sharing contributes to his growth as a developerand mentor.
Read more
  • 0
  • 0
  • 58781

article-image-simplifying-web-app-creation-with-streamlit-insights-and-personal-reflections
Rosario Moscato
27 Nov 2024
5 min read
Save for later

Simplifying Web App Creation with Streamlit – Insights and Personal Reflections

Rosario Moscato
27 Nov 2024
5 min read
IntroductionIn the ever-evolving world of web development, the need for rapid prototyping and efficient application creation has never been more paramount. "Web App Development Made Simple with Streamlit" is a book born out of this necessity. Streamlit, a powerful and intuitive tool, has transformed the way developers approach web app development, making it accessible to both seasoned programmers and beginners alike. This article delves into the core technologies discussed in the book, explores adjacent topics, and shares my personal journey in writing it.In-Depth Analysis of Streamlit TechnologyStreamlit is an open-source Python library that simplifies the development of web applications. Unlike traditional frameworks such as Django or Flask, Streamlit focuses on ease of use and rapid development. It allows developers to build interactive, data-driven web applications with minimal code.Key Features of Streamlit1. Simplicity and Speed: Streamlit's API is designed to be straightforward. With just a few lines of code, you can create a fully functional web app. This speed is crucial for data scientists and analysts who need to visualize data quickly. 2. Interactive Widgets: Streamlit provides a variety of widgets (e.g., sliders, buttons, and text inputs) that make it easy to create interactive applications. These widgets are essential for building user-friendly interfaces. 3. Real-Time Updates: Streamlit apps update in real-time as you modify the code, providing immediate feedback. This feature is particularly useful during the development phase, allowing for rapid iteration. 4. Integration with Python Ecosystem: Streamlit seamlessly integrates with popular Python libraries such as NumPy, Pandas, and Matplotlib. This compatibility ensures that developers can leverage existing tools and libraries.Solving Real-World ProblemsStreamlit addresses several challenges in web app development: Rapid Prototyping: Traditional web development frameworks often require significant setup and boilerplate code. Streamlit eliminates this overhead, enabling developers to focus on the core functionality of their applications. Data Visualization: Streamlit excels in creating data-driven applications. Its integration with visualization libraries allows developers to build insightful dashboards and reports with ease. - Accessibility: Streamlit's simplicity makes web development accessible to non-experts. Data scientists and analysts, who may not have extensive web development experience, can create powerful applications without needing to learn complex frameworks.Adjacent Topics: Best Practices and Emerging TrendsWhile Streamlit is a powerful tool, understanding best practices and emerging trends in web development is crucial for maximizing its potential.Best Practices1. Modular Code: Break your application into smaller, reusable components. This modular approach makes the codebase more manageable and easier to maintain. 2. Version Control: Use version control systems like Git to track changes and collaborate with others. Streamlit's simplicity should not deter you from following industry standard practices. 3. Testing: Implement automated testing to ensure your application behaves as expected. Libraries like pytest can be integrated with Streamlit applications for this purpose.Emerging Trends1. Low-Code/No-Code Platforms: Streamlit is part of a broader trend towards low-code and no-code platforms, which aim to democratize software development. These platforms enable users to build applications without extensive programming knowledge. 2. AI and Machine Learning Integration: Streamlit's ease of use makes it an ideal platform for deploying machine learning models. Expect to see more integration with AI tools and libraries in the near future. 3. Collaborative Development: Tools that facilitate collaboration among developers, data scientists, and domain experts are gaining traction.Streamlit's simplicity and real-time updates make it a natural fit for collaborative projects.My Writing JourneyWriting "Web App Development Made Simple with Streamlit" was a journey filled with both challenges and triumphs. The idea for the book stemmed from my own experiences as a developer struggling with the complexities of traditional web development frameworks. Iwanted to create a resource that would simplify the process and empower others to build their applications.Challenges 1. Staying Updated: The world of technology is constantly evolving. Keeping up with the latest updates and features in Streamlit was a significant challenge. I spent countless hours researching and experimenting to ensure the book was up-to-date. 2. Balancing Depth and Accessibility: Striking the right balance between providing in-depth technical information and making the content accessible to beginners was a delicate task. I aimed to cater to a broad audience without overwhelming them with jargon. 3. Time Management: Writing a book while juggling professional and personal commitments required meticulous time management. Setting realistic deadlines and sticking to a writing schedule was crucial.Overcoming Challenges1. Community Support: The Streamlit community was an invaluable resource. Engaging with other developers, participating in forums, and attending webinars helped me stay informed and inspired. 2. Iterative Writing: I adopted an iterative writing process, revisiting and refining chapters based on feedback from beta readers. This approach ensured the content was both accurate and user-friendly. 3. Personal Passion: My passion for simplifying web development kept me motivated. The thought of helping others overcome the same challenges I faced was a powerful driving force.Problem-Solving with StreamlitThe primary problem I aimed to solve with Streamlit was the steep learning curve associated with traditional web development frameworks. Many developers, especially those transitioning from data science or academia, struggle with the intricacies of setting up servers, managing databases, and creating user interfaces.Significance of the ProblemBarrier to Entry: The complexity of traditional frameworks can deter aspiring developers from pursuing web development. Streamlit lowers this barrier, making it accessible to a wider audience. Inefficiency: Time spent on boilerplate code and setup can be better utilized for developing core functionalities. Streamlit's simplicity allows developers to focus on solving real-world problems.SolutionStreamlit provides a streamlined approach to web development. By abstracting away the complexities, it empowers developers to create applications quickly and efficiently. My book offers practical examples and step-by-step guides to help readers harness the full potential of Streamlit.Unique Insights and Experiences"Web App Development Made Simple with Streamlit" offers several unique insights, drawn from my personal experiences and expertise.Key Insights 1. Real-World Applications: The book features case studies and real-world examples that demonstrate how Streamlit can be used to solve practical problems. These examples provide readers with a clear understanding of the tool's capabilities. 2. Hands-On Approach: Each chapter includes hands-on exercises and projects, encouraging readers to apply what they've learned. This interactive approach ensures a deeper understanding of the material. 3. Community Contributions: The book highlights contributions from the Streamlit community, showcasing innovative applications and best practices. This collaborative spirit reflects the open-source nature of Streamlit.Personal ExperiencesLearning from Mistakes: My journey with Streamlit was not without its mistakes. I share these experiences in the book, offering readers valuable lessons and tips to avoid common pitfalls. Continuous Learning: Writing the book was a learning experience in itself. I discovered new features, explored advanced use cases, and deepened my understanding of web development.Conclusion"Web App Development Made Simple with Streamlit" is more than just a technical guide; it's a testament to the power of simplicity and accessibility in web development. Streamlit has revolutionized the way we approach application development, making it possible for anyone to create powerful, interactive web apps. Through this article and my book, I hope to inspire and empower others to embark on their own development journeys, armed with the knowledge and tools to succeed.Author BioRosario Moscato has a master's degree in electronic engineering, a second level master in internet software design and a first level master's in science and faith. In about 25 years of experience, he has worked on innovative technology development in Europe and Asia. Recently, his interests have been focused exclusively on AI, pursuing the goal of making every business extremely competitive and analyzing the ethical implications deriving from the new scenarios that these disciplines open. Rosario has authored two books, and he is a speaker at international research centres and conferences as well as a trainer and technical/scientific consultant. Currently, he is working as CTO with one of the oldest AI companies in Italy.
Read more
  • 0
  • 0
  • 58553

article-image-scripting-with-postman-a-practical-guide-to-postman-scripting-and-security
Confidence Staveley
11 Nov 2024
15 min read
Save for later

Scripting with Postman: A Practical Guide to Postman Scripting and Security

Confidence Staveley
11 Nov 2024
15 min read
IntroductionAPIs are everywhere these days. In fact, they’re responsible for a whopping 73% of internet traffic in 2023, according to the State of API Security in 2024 Report by Imperva. With this level of activity, especially in industries like banking and online retail, securing APIs isn’t just important—it’s essential. The average organization has around 613 API endpoints in production, and as the pressure to deliver faster mounts, that number is only growing. To keep up with this demand while ensuring security, adopting a ‘Shift-left’ approach is crucial. What does that mean? It means integrating security earlier in the development process—right from the design stage, all the way to deployment. By doing this, you’re not just patching up vulnerabilities at the end but embedding security into the very fabric of your API. Getting Started with API Testing API testing plays a huge role in this approach. You’re essentially poking and prodding at the logic layer of your application, checking how it responds, how fast it does so, how accurately it handles data, and whether it can fend off security threats. This is where Postman shines. It’s a widely used tool that’s loved for its ease of use and versatility, making it a perfect fit for your shift-left strategy. With Postman, you can simulate attack scenarios, test for security issues, and validate security measures, all within the same space where you build your APIs. But before we dive into scripting in Postman, let’s get it installed. Installing Postman and Setting Up First things first, if you don’t already have a Postman account, head over to their website to create one. You can sign up using Google if you prefer. Once that’s done, download the version that suits your operating system and get it installed. We’ll need a vulnerable API to test our scripts, and the BreachMe API in my book (API Security For White Hat Hackers) is perfect for this. You can find it here. Follow the documentation to set it up, and don’t forget to import the BreachMe collection into Postman. Just click the import button in the collections tab, and you’re good to go. Postman Scripting Basics Scripts in Postman are where things get really interesting. They allow you to add dynamic behavior to your API requests, mimicking complex workflows and writing test assertions that simulate real-world scenarios. Postman’s sandbox execution environment is written in JavaScript, This means that in order to make a script executable in Postman, it has to be written in Javascript. So, If you’re familiar with Javascript, you’re already halfway there. There are two main types of scripts in postman. The first, pre-request script which is run before a request is rendered to Postman. The second, post-response scripts are scripts that are run after Postman gives a response to a sent request. The order of script execution for a single request is as follows: There are two main types of scripts in Postman: Pre-request Scripts: These run before a request is sent to the API. Post-response Scripts: These kick in after the API responds. The order of script execution for a single request is as follows: Pre-request script You can run these scripts at three levels: the request level, folder level, and collection level. This flexibility means you can apply scripts to individual requests, a group of requests, or even an entire collection. The execution of these scripts will happen in the following order. Dynamic Variables and API Testing During API testing, you often need to work with various user inputs, which can be tedious to create manually each time. One of the coolest features of Postman scripting is the ability to add dynamic behavior to a request. Imagine trying to manually create user inputs for each test—it would be a nightmare. Postman allows you to automate this process, generating variables like random usernames, random IP addresses, email addresses, and passwords on the fly. For example, to generate a dynamic username you can use {{$randomUserName}}. Want a dynamic email? Just use {{$randomEmail}}. And for a password, {{$randomPassword}} has you covered. This makes it easy to send multiple requests to the register endpoint, effectively testing the API.  Postman provides everything and we can now send as many register requests as we need to effectively test the API. Dynamic variables can also be set using pre-request scripts. Post-Response Scripts for Functional Testing Postman can be used to perform essential API testing such as functional testing, this is testing to ensure that the API works/functions in the way it is intended to. When testing functionality, postman allows you to send requests to your API endpoints and validate the responses against expected outcomes. You can check if the API returns the correct data, handles inputs properly, and responds with appropriate status codes (e.g., 200 OK, 404 Not Found). Let’s try that in the above API to check if the login endpoint will return a 200 status code. Navigate to the login endpoint’s script tab and choose the post-response tab.  The script we will use will look like this… Let’s break down the snippet. We’ll use the pm.test() function. The first argument “Status code is 200” will be used as the description of the test. The second argument is a function that will contain the actual test. The pm.response refers to the response object. .to.have.status(200) evaluates whether the response status code is 200.Post-request scripts can be used to set tokens or variables that will be needed throughout the testing of the API. Imagine testing an API and manually copying and pasting variables between requests—tiring, right? This approach ensures the variable is accessible across all requests in the collection, making your testing workflow more efficient, less error-prone, and more secure. Some variables contain sensitive data and may require a bit more protection especially when working in a collaborative environment. Postman recommends using variables in such cases. Let’s take an example of an access token that is short-lived, used by most of the endpoints in the collection, and is set when the login request is successful. To streamline this, we could use a post-response script in the login endpoint to automatically set it. Navigate to the auth folder of the Breachme_API collection and select the login endpoint. Ensure that the username you are trying to log in as is a registered user but using the register login before the login endpoint. When logging in, you’ll require a correct username and password in the body of the request as shown below. The correct credentials will result in a response containing the token. To set it, we will need to get the response; take only the token and set it. The script will look like this: var theResponse =pm.response.json();  pm.collectionVariables.set("access_token", theResponse.token) The first line of code captures the response from the API request and converts it to a JSON object then stores it in a variable theResponse. The pm.response.json() is a Postman function that parses the response body as JSON, making it accessible as a JavaScript object. With the response accessible, we can then get the token using the theResponse.token and set it as a collection variable with the command pm.collectionVariables.set() function. The first parameter will specify the collection variable you want to save it as.  Postman scripts can also be used to validate whether the response contains the expected data. Let’s say you have created a post, you would expect it to have the ‘id’, ‘username’, ‘message’, and maybe an ‘image’. You can use Postman to check if every expected data is returned in the expected format. Let’s check if the register endpoint returns what we expect, with the body {    "username":"user2",    "email":"user2@email.com",    "password":"user2password"  } We expect the response to look like below {    "message": "user created successfully",    "user": {        "id": 2,        "email": "user2@email.com",        "username": "user2",        "is_admin": false,        "password": "#############",        "createdAt": "2024-08-28T22:13:30.000Z"    },    "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6MiwiZW1haWwiOiJ1c2VyMkBlbWFpbC5jb20iLCJ1c2VybmFtZSI6InVzZXIyIiwiaXNfYWRtaW4iOmZhbHNlLCJwYXNzd29yZCI6IiMjIyMjIyMjIyMjIyMiLCJjcmVhdGVkQXQiOiIyMDI0LTA4LTI4VDIyOjEzOjMwLjAwMFoiLCJpYXQiOjE3MjQ4ODMyMTAsImV4cCI6MTcyNTE0MjQxMH0.Z3fdfRXkePNFoWgX2gSqrTTtOy_AzsnG8yG_wKdnOz4"  } To automate it, we will use the script in the post-response tab of the register endpoint. pm.test("User object has id, email, username, is_admin, password", function () {    const responseData = pm.response.json();    pm.expect(responseData.user).to.have.property("id");    pm.expect(responseData.user).to.have.property("email");    pm.expect(responseData.user).to.have.property("username");    pm.expect(responseData.user).to.have.property("is_admin");    pm.expect(responseData.user).to.have.property("password");  }); To ensure that your API meets performance requirements, you can use a post-response script to measure the response time. The snippet we will use us as seen below: pm.test("Response time is less than 300ms", function () {  pm.expect(pm.response.responseTime).to.be.below(300);  }); The above script uses the pm.test() function with the test description as the first argument and an anonymous function that contains the actual test as the second argument. The pm.expect() function is a Postman function that is used to make assertions, it sets up an expectation for a certain condition. In this case, it expects that the pm.response,responseTime will be below 300 milliseconds. Not meeting this expectation makes the test fail. Conclusion Scripting in Postman isn’t just about convenience—it’s about transforming your testing into a proactive, security-focused process. By using these scripts, you’re not only automating repetitive tasks but also fortifying your API against potential threats. Combine this with other security measures, and you’ll have an API that’s ready to hold its own in the fast-paced world of software development. "As you continue to deepen your understanding of API security and testing, consider exploring "API Security for White Hat Hackers" written by Confidence Staveley. This book is a comprehensive guide that simplifies API security by showing you how to identify and fix vulnerabilities. From emerging threats to best practices, this book helps you defend and safeguard your APIs.Author BioConfidence Staveley is a multi-award-winning cybersecurity leader with a background in software engineering, specializing in application security and cybersecurity strategy. Confidence excels in translating cybersecurity concepts into digestible insights for diverse audiences. Her YouTube series, “API Kitchen,” explains API security using culinary metaphors.nConfidence holds an advanced diploma in software engineering, a bachelor’s degree in IT and business information systems, and a master’s degree in IT management from the University of Bradford, as well as numerous industry certifications such as CISSP, CSSLP, and CCISO. In addition to her advisory roles on many boards, Confidence is the founder of CyberSafe Foundation and MerkleFence.
Read more
  • 0
  • 0
  • 58535
article-image-mastering-machine-learning-best-practices-and-the-future-of-generative-ai-for-software-engineers
Miroslaw Staron
25 Oct 2024
10 min read
Save for later

Mastering Machine Learning: Best Practices and the Future of Generative AI for Software Engineers

Miroslaw Staron
25 Oct 2024
10 min read
IntroductionThe field of machine learning (ML) and generative AI has rapidly evolved from its foundational concepts, such as Alan Turing's pioneering work on intelligence, to the sophisticated models and applications we see today. While Turing’s ideas centered on defining and detecting intelligence, modern applications stretch the definition and utility of intelligence in the realm of artificial neural networks, language models, and generative adversarial networks. For software engineers, this evolution presents both opportunities and challenges, from creating intelligent models to harnessing tools that streamline development and deployment processes. This article explores the best practices in machine learning, insights on deploying generative AI in real-world applications, and the emerging tools that software engineers can utilize to maximize efficiency and innovation.Exploring Machine Learning and Generative AI: From Turing’s Legacy to Today's Best Practices When Alan Turing developed his famous Turing test for intelligence, computers, and software were completely different from what we are used to now. I’m certain that Turing did not think about Large Language Models (LLMs), Generative AI (GenAI), Generative Adversarial Networks, or Diffusers. Yet, this test for intelligence is equally useful today as it was at the time when it was developed. Perhaps our understanding of intelligence has evolved since then. We consider intelligence on different levels, for example, at the philosophical level and the computer science level. At the philosophical level, we still try to understand what intelligence really is, how to measure it, and how to replicate it. At the computer science level, we develop new algorithms that can tackle increasingly complex problems, utilize increasingly complex datasets, and provide more complex output. In the following figure, we can see two different solutions to the same problem. On the left-hand side, the solution to the Fibonacci problem uses good old-fashioned programming where the programmer translates the solution into a program. On the right-hand side, we see a machine learning solution – the programmer provides example data and uses an algorithm to find the pattern just to replicate it later.   Figure 1. Fibonacci problem solved with a traditional algorithm (left-hand side) and machine learning’s linear regression (right-hand side). Although the traditional way is slow, it can be mathematically proven to be correct for all numbers, whereas the machine learning algorithm is fast, but we do not know if it renders correct results for all numbers. Although the above is a simple example, it illustrates that the difference between a model and an algorithm is not that great. Essentially, the machine learning model on the right is a complex function that takes an input and produces an output. The same is true for the generative AI models.  Generative AI Generative AI is much more complex than the algorithms used for Fibonacci, but it works in the same way – based on the data it creates new output. Instead of predicting the next Fibonacci number, LLMs predict the next token, and diffusers predict values of new pixels. Whether that is intelligence, I am not qualified to judge. What I am qualified to say is how to use these kinds of models in modern software engineering.  When I wrote the book Machine Learning Infrastructure and Best Practices for Software Engineers1, we could see how powerful ChatGPT 3.5 is. In my profession, software engineers use it to write programs, debug them and even to improve the performance of the programs. I call it being a superprogrammer. Suddenly, when software engineers get these tools, they become team leader for their bots, who support them – these bots are the copilots for the software engineers. But using these tools and models is just the beginning.  Harnessing NPUs and Mini LLMs for Efficient AI Deployment Neural Processing Units (NPUs) have started to become more popular in modern computers, which addresses the challenges with running language models locally, without the access to internet. The local execution reduces latency and reduces security risks of hijacking information when it is sent between the model and the client. However, the NPUs are significantly less powerful than data centers, and therefore we can only use them with small language models – so-called mini-LLMs. An example of such a model is Phi-3-mini model developed by Microsoft2. In addition to NPUs, frameworks like ONNX appeared, which made it possible to quickly interchange models between GPUs and NPUs – you could train the model on a powerful GPU and use it on a small NPU thanks to these frameworks.  Since AI take so much space in modern hardware and software, GeekbenchAI3 is a benchmark suite that allows us to quantify and compare AI capabilities of modern hardware. I strongly recommend to take it for a spin to check what we can do with the hardware that we have at hands. Now, hardware is only as good as the software, and there, we also saw a lot of important improvements.  Ollama and LLM frameworks In my book, I presented the methods and tools to work with generative AI (as well as the classical ML). It’s a solid foundation for designing, developing, testing and deploying AI systems. However, if we want to utilize LLMs without the hassle of setting up the entire environment, we can use frameworks like Ollama4. The Ollama framework seamlessly downloads and deploys LLMs on a local machine if we have enough resources. Once installing the framework, we can type ollama run phi-3 to start a conversation with the model. The framework provides a set of user interfaces, web services and other types of mechanisms needed to construct a fully-fledged machine learning software5.  We can use it locally for all kinds of tasks, e.g., in finance6 . What’s Next: Embracing the Future of AI in Software Engineering As generative AI continues to evolve, its role in software engineering is set to expand in exciting ways. Here are key trends and opportunities that software engineers should focus on to stay ahead of the curve: Mastering AI-Driven Automation: AI will increasingly take over repetitive programming and testing tasks, allowing engineers to focus on more creative and complex problems. Engineers should leverage AI tools like GitHub Copilot and Ollama to automate mundane tasks such as bug fixing, code refactoring, and even performance optimization. Actionable Step: Start integrating AI-driven tools into your development workflow. Experiment with automating unit tests, continuous integration pipelines, or even deployment processes using AI. AI-Enhanced Collaboration: Collaboration with AI systems, or "AI copilots," will be a crucial skill. The future of software engineering will involve not just individual developers using AI tools but entire teams working alongside AI agents that facilitate communication, project management, and code integration. Actionable Step: Learn to delegate tasks to AI copilots and explore collaborative platforms that integrate AI to streamline team efforts. Tools like Microsoft Teams and Github Copilot integrated with AI assistants are a good start. On-device AI and Edge Computing: The rise of NPUs and mini-LLMs signals a shift towards on-device AI processing. This opens opportunities for real-time AI applications in areas with limited connectivity or stringent privacy requirements. Software engineers should explore how to optimize and deploy AI models on edge devices. Actionable Step: Experiment with deploying AI models on edge devices using frameworks like ONNX and test how well they perform on NPUs or embedded systems. To stay competitive and relevant, software engineers need to continuously adapt by learning new AI technologies, refining their workflows with AI assistance, and staying attuned to emerging ethical challenges. Whether by mastering AI automation, optimizing edge deployments, or championing ethical practices, the future belongs to those who embrace AI as both a powerful tool and a collaborative partner. For software engineers ready to dive deeper into the transformative world of machine learning and generative AI, Machine Learning Infrastructure and Best Practices for Software Engineers offers a comprehensive guide packed with practical insights, best practices, and hands-on techniques.ConclusionAs generative AI technologies continue to advance, software engineers are at the forefront of a new era of intelligent and automated development. By understanding and implementing best practices, engineers can leverage these tools to streamline workflows, enhance collaborative capabilities, and push the boundaries of what is possible in software development. Emerging hardware solutions like NPUs, edge computing capabilities, and advanced frameworks are opening new pathways for deploying efficient AI solutions. To remain competitive and innovative, software engineers must adapt to these evolving technologies, integrating AI-driven automation and collaboration into their practices and embracing the future with curiosity and responsibility. This journey not only enhances technical skills but also invites engineers to become leaders in shaping the responsible and creative applications of AI in software engineering.Author BioMiroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner’s Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.
Read more
  • 0
  • 0
  • 56540

article-image-from-c98-to-c23-the-arithmetic-mean-benchmarked-and-optimized
Ferenc Deak
27 Mar 2025
10 min read
Save for later

From C++98 to C++23: The Arithmetic Mean, Benchmarked and Optimized

Ferenc Deak
27 Mar 2025
10 min read
Averaging algorithms are computational techniques used to determine the mean or average value of a set of numbers. They are widely used in machine learning (model optimization), signal processing (noise reduction), finance (trend analysis), and networking (traffic smoothing). Some common averaging algorithms include Arithmetic Mean (Simple Average), Moving Average (Sliding Window Average), Harmonic Mean, and Geometric Mean.In this article we will explore one of the most common averaging algorithms: the Arithmetic Mean, and its implementation in C++. We will explore how to compute the average of a vector using five different approaches, each corresponding to a major C++ standard, from C++98 to C++23. Additionally, we will benchmark these implementations to analyze their performance and efficiency. Finally, we will dig into the code generated by the two most used compilers on Linux platforms, GCC and Clang, and perform a thorough analysis and see which one comes out the winner.Readers of this article will gain a deeper understanding of Practical C++ implementation across different standards: you will see how to compute the arithmetic mean of a vector using various C++ techniques and understand the evolution of the language from C++98 to C++23. You will also gain insights into optimizing code for performance, while keeping Modern C++ coding practices in mind and learn how they can be used to write efficient and readable code.A Brief History of C++ StandardsC++ was first standardized in 1998 as C++98, which provided a stable foundation for the language. It was followed by C++03, a minor update that mainly focused on defect fixes, but without a significant change to the language mechanics or syntax. The major overhaul came with C++11, which introduced modern programming features such as auto, lambda expressions, smart pointers, and noteworthy syntax changes.C++14 refined these features and introduced new constructs to the Standard Template Library (STL), while C++17 improved performance with parallel algorithms and introduced std::optional together with new compile time expressions and removal of some older language syntax. The introduction of C++20 brought again revolutionary changes like concepts, ranges, and coroutines, enhancing the expressiveness of the language and on top of it introduced new syntax and operators too. Finally, C++23 builds upon these improvements with even more enhancements to ranges and better support for functional-style programming.The question naturally arises: why this quick detour through C++ history? Three main reasons drive this. First, we aim to emphasize the remarkable evolution of the language itself. This context is essential for understanding what we’ll demonstrate next: five distinct implementations of a basic algorithm. Second, we’re keen to explore the increasing expressiveness of C++ syntax across its versions. And finally, driven by curiosity, we’ll also touch upon the question of performance — does adopting a newer standard automatically lead to faster code?Computing the Average: Five ApproachesLet us now delve into five versions of a function that calculates the average of a vector of random integers. Each version utilizes a different C++ standard, demonstrating how modern C++ simplifies and makes common programming tasks more readable, but before introducing them, here is the code that generates the vector:1. C++98: Classic for-loop ApproachIn C++98, we relied on manual iteration using a basic for loop:This approach was straightforward but lacked modern conveniences like range-based loops or built-in algorithms.2. C++11: Range-based LoopWith C++11, we gained access to range-based loops, making iteration more concise and readable:This improved readability by removing explicit indexing and is one of the many usability improvements introduced in C++11.3. C++14: Using std::accumulateC++14 introduced functional-style algorithms such as std::accumulate, which simplified summing the elements:This approach is more declarative and eliminated the need for a loop. C++14 focused on minor refinements and usability improvements over C++11.4. C++17: Using std::reduceC++17 introduced std::reduce, which is optimized for parallel execution:std::reduce provides potential performance gains by allowing parallel execution when used with std::execution::par. C++17 marked a shift towards greater use of parallelism and optimization in the Standard Library.5. C++23: Using std::ranges::fold_leftWith C++23, we can leverage std::ranges::fold_left to compute the sum in a more functional programming style:This is the most modern and expressive approach, benefiting from the new std::ranges library. C++23 continues to improve upon the functional and declarative paradigms introduced in C++20.Benchmarking: Performance ComparisonTo analyze the performance of these implementations, we have developed a benchmarking function that executes each function 1000 times and calculates the average execution time in nanoseconds:The preceding function ensures more reliable results by mitigating fluctuations caused by system variations.ResultsWe tested these implementations of the randomly generated vector of 500000 elements, on a machine having a 13th Gen Intel i7–13700H (20) @ 4.800GHz CPU. Here’s a sample output using the latest version of the two major compilers as per February 2025.(However, please take this with a pinch of salt, like everything related to timed execution on a modern operating system, you know, context switches, multiple processes being needy, Linux juggling with these, etc…).Here are two example outputs:Compiled by:clang++ - 21.0.0Vector size: 500000 elementsC++98 Average: 50.458700 | Avg Time: 42083.258000 ns (over 1000 runs)C++11 Average: 50.458700 | Avg Time: 40607.151000 ns (over 1000 runs)C++14 Average: 50.458700 | Avg Time: 40443.039000 ns (over 1000 runs)C++17 Average: 50.458700 | Avg Time: 72351.570000 ns (over 1000 runs)C++23 Average: 50.458700 | Avg Time: 44660.055000 ns (over 1000 runs)=
Read more
  • 0
  • 0
  • 54900

article-image-empowering-modern-graphics-programming-using-vulkan
Preetish Kakkar
04 Nov 2024
10 min read
Save for later

Empowering Modern Graphics Programming using Vulkan

Preetish Kakkar
04 Nov 2024
10 min read
Introduction In the rapidly evolving world of computer graphics, Vulkan has emerged as a powerful and efficient API, revolutionizing how developers approach rendering and compute operations. As the author of "The Modern Vulkan Cookbook," I've had the privilege of diving deep into this technology, exploring its intricacies, and uncovering its potential to solve real-world problems in graphics programming. This book will help you leverage modern graphics programming techniques. You’ll cover a cohesive set of examples that use the same underlying API, discovering Vulkan concepts and their usage in real-world applications.Vulkan, introduced by the Khronos Group in 2016, was designed to address the limitations of older graphics APIs like OpenGL. Its low-overhead, cross-platform nature has made it increasingly popular among developers seeking to maximize performance and gain fine-grained control over GPU resources. One of Vulkan's key strengths lies in its ability to efficiently utilize modern multi-core CPUs and GPUs. By providing explicit control over synchronization and memory management, Vulkan allows developers to optimize their applications for specific hardware configurations, resulting in significant performance improvements. Vulkan Practical Applications Vulkan's impact on solving real-world problems in graphics programming is profound and far-reaching. In the realm of mobile gaming, Vulkan's efficient use of system resources has enabled developers to create console-quality graphics on smartphones, significantly enhancing the mobile gaming experience while conserving battery life. In scientific visualization, Vulkan's compute capabilities have accelerated complex simulations, allowing researchers to process and visualize large datasets in real-time, leading to breakthroughs in fields like climate modeling and molecular dynamics. The film industry has leveraged Vulkan's ray tracing capabilities to streamline pre-visualization processes, reducing production times and costs. In automotive design, Vulkan-powered rendering systems have enabled real-time, photorealistic visualizations of car interiors and exteriors, revolutionizing the design review process. Virtual reality applications built on Vulkan benefit from its low-latency characteristics, reducing motion sickness and improving overall user experience in training simulations for industries like healthcare and aerospace. These practical applications demonstrate Vulkan's versatility in solving diverse challenges across multiple sectors, from entertainment to scientific research and industrial design. Throughout my journey writing "The Modern Vulkan Cookbook," I encountered numerous scenarios where Vulkan's capabilities shine in solving practical challenges: GPU-Driven Rendering: Vulkan's support for compute shaders and indirect drawing commands enables developers to offload more work to the GPU, reducing CPU overhead and improving overall rendering efficiency. This is particularly beneficial for complex scenes with dynamic object counts or procedurally generated geometry. Advanced Lighting and Shading: Vulkan's flexibility in shader programming allows for the implementation of sophisticated lighting models and material systems. Techniques like physically based rendering (PBR) and global illumination become more accessible and performant under Vulkan. Order-Independent Transparency: Achieving correct transparency in real-time rendering has always been challenging. Vulkan's support for advanced rendering techniques, such as A-buffer implementations or depth peeling, provides developers with powerful tools to tackle this issue effectively. Ray Tracing: With the introduction of ray tracing extensions, Vulkan has opened new possibilities for photorealistic rendering in real-time applications. This has profound implications for industries beyond gaming, including architecture visualization and film production. Challenges and Learning Curves Despite its power, Vulkan comes with a steep learning curve. Its verbose nature and explicit control can be daunting for newcomers. During the writing process, I faced the challenge of breaking down complex concepts into digestible chunks without sacrificing depth. This led me to develop a structured approach, starting with core concepts and gradually building up to advanced techniques. One hurdle was explaining the intricacies of Vulkan's synchronization model. Unlike older APIs, Vulkan requires explicit synchronization, which can be a source of confusion and errors for many developers. To address this, I dedicated significant attention to explaining synchronization primitives and their proper usage, providing clear examples and best practices. The Future of Graphics Programming with Vulkan As we look to the future, Vulkan's role in graphics programming is set to grow even further. The API continues to evolve, with new extensions and features being added regularly. Some exciting areas of development include: Machine Learning Integration: The intersection of graphics and machine learning is becoming increasingly important. Vulkan's compute capabilities make it well-suited for implementing ML algorithms directly on the GPU, opening up possibilities for AI-enhanced rendering techniques. Extended Reality (XR): With the rising popularity of virtual and augmented reality, Vulkan's efficiency and low-latency characteristics make it an excellent choice for XR applications. The integration with OpenXR further solidifies its position in this space. Cross-Platform Development: As Vulkan matures, its cross-platform capabilities are becoming more robust. This is particularly valuable for developers targeting multiple platforms, from high-end PCs to mobile devices and consoles. Conclusion Writing "The Modern Vulkan Cookbook" has been an enlightening journey, deepening my appreciation for the power and flexibility of Vulkan. As graphics hardware continues to advance, APIs like Vulkan will play an increasingly crucial role in harnessing this power efficiently. For developers looking to push the boundaries of what's possible in real-time rendering, Vulkan offers a robust toolset. While the learning curve may be steep, the rewards in terms of performance, control, and cross-platform compatibility make it a worthy investment for any serious graphics programmer. Author Bio Preetish Kakkar is a highly experienced graphics engineer specializing in C++, OpenGL, WebGL, and Vulkan. He co-authored "The Modern Vulkan Cookbook" and has extensive experience developing rendering engines, including rasterization and ray-traced pipelines. Preetish has worked with various engines like Unity, Unreal, and Godot, and libraries such as bgfx. He has a deep understanding of the 3D graphics pipeline, virtual/augmented reality, physically based rendering, and ray tracing. 
Read more
  • 0
  • 0
  • 50139
article-image-creating-custom-tools-in-unity-automating-repetitive-tasks
Mohamed Essam
17 Oct 2024
5 min read
Save for later

Creating Custom Tools in Unity: Automating Repetitive Tasks

Mohamed Essam
17 Oct 2024
5 min read
Introduction In the fast-paced world of game development, efficiency is key. Developers often find themselves repeating mundane tasks that can consume valuable time and lead to human error. Imagine if you could automate these repetitive tasks, freeing up your time to focus on more creative aspects of your game. In this article, we’ll explore how creating custom tools in Unity can transform your workflow, boost productivity, and reduce the risk of mistakes. Why Custom Tools Matter Custom tools are tailored solutions that address specific needs in your development process. Here’s why they are crucial: Efficiency: Automate routine tasks to save time and reduce manual effort. Consistency: Ensure that repetitive tasks are executed uniformly across your project. Error Reduction: Minimize human errors by automating processes that are prone to mistakes. Focus on Creativity: Spend more time on innovative aspects of your game rather than getting bogged down with repetitive tasks. Creating Your First Custom Tool Overview: We’ll walk through the process of creating a simple Unity editor tool to automate tasks like aligning game objects or batch renaming assets. 1. Setting Up Your Editor Window Define the Purpose: Clearly outline what your tool will accomplish. Create a New Editor Window: Use Unity’s editor scripting API to create a custom window. Add Basic UI Elements: Incorporate buttons, sliders, or input fields to interact with the tool. 2. Implementing Core Functionality Aligning Game Objects: Write scripts to align selected game objects in the scene. Batch Renaming Assets: Create a script that renames multiple assets based on a naming convention. 3. Tips for Effective Custom Tools Start Simple: Begin with a basic tool and gradually add complexity as needed. Prioritize Usability: Ensure your tool is intuitive and easy to use, even for developers who may not be familiar with the script. Document Your Code: Include comments and documentation to make future updates easier. How Custom Tools Solve Common Issues Custom tools address several common development challenges: Repetitive Tasks: Automate repetitive processes like object alignment or asset management to streamline your workflow. Consistency Issues: Ensure that tasks are performed uniformly across your project, avoiding discrepancies and errors. Time Management: Free up time for more complex and creative aspects of your game development by automating mundane tasks. Let's dive into the hands-on section  We've all encountered broken game objects in our scenes, and manually searching through every object to find missing script references can be tedious and time-consuming. One of the key advantages of editor scripts is the ability to create a tool that automatically scans all game objects and pinpoints exactly where the issues are.  Script Dependency Checker . To use this script, simply place it in the Editor folder within your Assets directory. The script needs to be in the Editor directory to function properly. Here's the code that creates a new menu item, which you'll find in the Editor menu bar. What this script does is invoke the CheckDependencies method, which scans all game objects in the scene, checks for any missing components, and collects them in a list. The results are then displayed through the editor window using the OnGUI function. public class ScriptDependencyChecker : EditorWindow {    private static Vector2 scrollPosition;    private static string[] missingScripts = new string[0];    [MenuItem("Tools/Script Dependency Checker")]    public static void ShowWindow()    {        GetWindow<ScriptDependencyChecker>("Script Dependency Checker");    }    private void OnGUI()    {        if (GUILayout.Button("Check Script Dependencies"))        {            CheckDependencies();        }        if (missingScripts.Length > 0)        {            EditorGUILayout.LabelField("Objects with Missing Scripts:", EditorStyles.boldLabel);            scrollPosition = EditorGUILayout.BeginScrollView(scrollPosition, GUILayout.Height(300));            foreach (var entry in missingScripts)            {                EditorGUILayout.LabelField(entry);            }            EditorGUILayout.EndScrollView();        }        else        {            EditorGUILayout.LabelField("No missing scripts found.");        }    }    private void CheckDependencies()    {        var missingList = new System.Collections.Generic.List<string>();        GameObject[] allObjects = GameObject.FindObjectsOfType<GameObject>();        foreach (var obj in allObjects)        {            var components = obj.GetComponents<Component>();            foreach (var component in components)            {                if (component == null)                {                    missingList.Add($"Missing script on GameObject: {obj.name}");                }            }        }        missingScripts = missingList.ToArray();    } } Now, let's head over to the Unity Editor and start using this tool. As shown in Image 01, you'll find the Script Dependency Checker under Tools | Script Dependency Checker in the menu bar.  Image 01 - Unity’s menu bar When you click on it, a window will open with a button and a debug section that will display any game objects with missing script references, if found. You can see this in Image 02.  Image 02 - Script dependency window After pressing the button, we discovered a game object named AudioManager with a missing script, as shown in Image 03.  Image 03 - Results of the Checker Next, we can search for AudioManager in the hierarchy and address the issue by either reassigning the missing script or removing it entirely if it's no longer needed, as shown in Image 04.   Image 04 - Game Object with missing script Learning More Explore Unity Documentation: Unity’s official documentation provides comprehensive guides on editor scripting. Join Developer Communities: Engage with forums and communities like the Unity Developer Community or Stack Overflow to exchange ideas and get support. Experiment with Examples: Study and modify existing tools to understand their functionality and apply similar concepts to your projects. Conclusion Creating custom tools in Unity not only enhances your productivity but also ensures a smoother and more efficient development process. As you experiment with building and implementing your tools, consider other repetitive tasks that could benefit from automation. Whether it’s organizing project folders or generating procedural content, the possibilities are endless. By leveraging custom tools, you’ll gain more control over your development environment and focus on what truly matters—bringing your game to life. For more insights into Unity development and custom tools, check out my book, Mastering Unity Game Development with C#: Harness the Full Potential of Unity 2022 Game Development Using C#, where you’ll find in-depth guides and practical examples to further enhance your game development skills.  Author BioMohamed Essam is a highly skilled Unity developer with expertise in creating captivating gameplay experiences across various platforms. With a solid background in game development spanning over four years, he has successfully designed and implemented engaging gameplay mechanics for mobile devices and other platforms. His current focus lies in the development of a highly popular multiplayer game, boasting an impressive 20 million downloads. Equipped with a deep understanding of cutting-edge technologies and a knack for creative problem solving, Mohamed Essam consistently delivers exceptional results in his projects.
Read more
  • 0
  • 0
  • 48647

article-image-enhancing-data-quality-with-cleanlab
Prakhar Mishra
11 Dec 2024
10 min read
Save for later

Enhancing Data Quality with Cleanlab

Prakhar Mishra
11 Dec 2024
10 min read
IntroductionIt is a well-established fact that your machine-learning model is only as good as the data it is fed. ML model trained on bad-quality data usually has a number of issues. Here are a few ways that bad data might affect machine-learning models -1. Predictions that are wrong may be made as a result of errors, missing numbers, or other irregularities in low-quality data. The model's predictions are likely to be inaccurate if the data used to train is unreliable.2. Bad data can also bias the model. The ML model can learn and reinforce these biases if the data is not representative of the real-world situations, which can result in predictions that are discriminating.3. Poor data also disables the the ability of ML model to generalize on fresh data. Poor data may not effectively depict the underlying patterns and relationships in the data.4. Models trained on bad-quality data might need more retraining and maintenance. The overall cost and complexity of model deployment could rise as a result.As a result, it is critical to devote time and effort to data preprocessing and cleaning in order to decrease the impact of bad data on ML models. Furthermore, to ensure the model's dependability and performance, it is often necessary to use domain knowledge to recognize and address data quality issues.It might come as a surprise, but gold-standard datasets like ImageNet, CIFAR, MNIST, 20News, and more also contain labeling issues. I have put in some examples below for reference -The above snippet is from the Amazon sentiment review dataset , where the original label was Neutral in both cases, whereas Cleanlab and Mechanical turk said it to be positive (which is correct).The above snippet is from the MNIST dataset, where the original label was marked to be 8 and 0 respectively, which is incorrect. Instead, both Cleanlab and Mechanical Turk said it to be 9 and 6 (which is correct).Feel free to check out labelerrors to explore more such cases in similar datasets.Introducing CleanlabThis is where Cleanlab can come in handy as your best bet. It helps by automatically identifying problems in your ML dataset, it assists you in cleaning both data and labels. This data centric AI software uses your existing models to estimate dataset problems that can be fixed to train even better models. The graphic below depicts the typical data-centric AI model development cycle:Apart from the standard way of coding all the way through finding data issues, it also offers Cleanlab Studio - a no-code platform for fixing all your data errors. For the purpose of this blog, we will go the former way on our sample use case.Getting Hands-on with CleanlabInstallationInstalling cleanlab is as easy as doing a pip install. I recommend installing optional dependencies as well, you never know what you need and when. I also installed sentence transformers, as I would be using them for vectorizing the text. Sentence transformers come with a bag of many amazing models, we particularly use ‘all-mpnet-base-v2’ as our choice of sentence-transformers for vectorizing text sequences. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for tasks like clustering or semantic search. Feel free to check out this for the list of all models and their comparisons.pip install ‘cleanlab[all]’ pip install sentence-transformersDatasetWe picked the SMS Spam Detection dataset as our choice of dataset for doing the experimentation. It is a public set of labeled SMS messages that have been collected for mobile phone spam research with total instances of roughly ~5.5k. The below graphic gives a sneak peek of some of the samples from the dataset.Data PreviewCodeLet’s now delve into the code. For demonstration purposes, we inject a 5% noise in the dataset, and see if we are able to detect them and eventually train a better model.Note: I have also annotated every segment of the code wherever necessary for better understanding.import pandas as pd from sklearn.model_selection import train_test_split, cross_val_predict       from sklearn.preprocessing import LabelEncoder               from sklearn.linear_model import LogisticRegression       from sentence_transformers import SentenceTransformer       from cleanlab.classification import CleanLearning       from sklearn.metrics import f1_score # Reading and renaming data. Here we set sep=’\t’ because the data is tab       separated.       data = pd.read_csv('SMSSpamCollection', sep='\t')       data.rename({0:'label', 1:'text'}, inplace=True, axis=1)       # Dropping any instance of duplicates that could exist       data.drop_duplicates(subset=['text'], keep=False, inplace=True)       # Original data distribution for spam and not spam (ham) categories       print (data['label'].value_counts(normalize=True))       ham 0.865937       spam 0.134063       # Adding noise. Switching 5% of ham data to ‘spam’ label       tmp_df = data[data['label']=='ham']               examples_to_change = int(tmp_df.shape[0]*0.05)       print (f'Changing examples: {examples_to_change}')       examples_text_to_change = tmp_df.head(examples_to_change)['text'].tolist() changed_df = pd.DataFrame([[i, 'spam'] for i in examples_text_to_change])       changed_df.rename({0:'text', 1:'label'}, axis=1, inplace=True)       left_data = data[~data['text'].isin(examples_text_to_change)]       final_df = pd.concat([left_data, changed_df])               final_df.reset_index(drop=True, inplace=True)       Changing examples: 216       # Modified data distribution for spam and not spam (ham) categories       print (final_df['label'].value_counts(normalize=True))       ham 0.840016       spam 0.159984    raw_texts, raw_labels = final_df["text"].values, final_df["label"].values # Converting label into integers encoder = LabelEncoder() encoder.fit(raw_train_labels)       train_labels = encoder.transform(raw_train_labels)       test_labels = encoder.transform(raw_test_labels)       # Vectorizing text sequence using sentence-transformers transformer = SentenceTransformer('all-mpnet-base-v2') train_texts = transformer.encode(raw_train_texts)       test_texts = transformer.encode(raw_test_texts)       # Instatiating model instance model = LogisticRegression(max_iter=200) # Wrapping the sckit model around CL cl = CleanLearning(model) # Finding label issues in the train set label_issues = cl.find_label_issues(X=train_texts, labels=train_labels) # Picking top 50 samples based on confidence scores identified_issues = label_issues[label_issues["is_label_issue"] == True] lowest_quality_labels =       label_issues["label_quality"].argsort()[:50].to_numpy()       # Beauty print the label issue detected by CleanLab def print_as_df(index):    return pd.DataFrame(              {    "text": raw_train_texts,              "given_label": raw_train_labels,           "predicted_label": encoder.inverse_transform(label_issues["predicted_label"]),       },       ).iloc[index]       print_as_df(lowest_quality_labels[:5]) As we can see, Cleanlab assisted us in automatically removing the incorrect labels and training a better model with the same parameters and settings. In my experience, people frequently ignore data concerns in favor of building more sophisticated models to increase accuracy numbers. Improving data, on the other hand, is a pretty simple performance win. And, thanks to products like Cleanlab, it's become really simple and convenient.Feel free to access and play around with the above code in the Colab notebook hereConclusionIn conclusion, Cleanlab offers a straightforward solution to enhance data quality by addressing label inconsistencies, a crucial step in building more reliable and accurate machine learning models. By focusing on data integrity, Cleanlab simplifies the path to better performance and underscores the significance of clean data in the ever-evolving landscape of AI. Elevate your model's accuracy by investing in data quality, and explore the provided code to see the impact for yourself.Author BioPrakhar has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn
Read more
  • 2
  • 0
  • 41913
Modal Close icon
Modal Close icon