Author Posts

article-image-the-small-file-tax-how-compaction-clustering-and-pruning-change-lakehouse-cost

13 Apr 2026

5 min read

The Small-File Tax: How Compaction, Clustering, and Pruning Change Lakehouse Cost

13 Apr 2026

Our Data Engineering Byte Newsletter gives data engineers and practitioners what they often lack today: clear, real-world insights—where every byte tells a story.Subscribe here to stay ahead in data engineeringIntroductionSame data, same engine, before and after tuning: what changes when hot partitions stop paying a per-file penalty.A lakehouse can look cheap in storage and still be expensive to read.The clue is usually a query that should be routine: yesterday’s data, one region, one status, a few columns. It hangs longer than it should, not because the engine is doing sophisticated analytics, but because it is working through too many files first. That overhead shows up in file listing, metadata evaluation, file-open cost, and the work required to decide what can be skipped.That is the small-file tax. It builds quietly in the systems we actually run: micro-batches, CDC pipelines, frequent upserts, and incremental merges. Those patterns keep data fresh, but they also fragment the hottest part of the table. The storage bill may barely notice. The read path does.Teams often misdiagnose this as a compute problem. They add more workers, and the query still spends too much time deciding what to read. Bigger clusters help less than they should when the table layout reflects ingest cadence more than query shape.Why small files are expensiveEvery file comes with fixed overhead.Before the engine reads much useful data, it has to discover files, inspect metadata, use statistics, and decide whether partition pruning or file-level skipping can eliminate work. When a table contains thousands of undersized files, that fixed work starts to dominate.The effect is easy to underestimate because it often hides in planning. Small-file tables spend more time getting ready to scan than they should. That leads to higher latency, more files touched, and more bytes read than the query really needed.Predicate pushdown helps inside a file. Pruning decides which files never needed to be read in the first place. If hot partitions are packed with tiny, poorly organized files, pushdown can only do so much.The practical point is simple: the small-file problem is often a planning problem before it becomes a scan problem.Benchmark setupThis piece is best read as a benchmark-informed engineering analysis, not a fresh benchmark report. I am not claiming new measured results here. The goal is to isolate layout as the variable and show how I would structure the comparison honestly.Keep the engine the same. Keep the dataset the same. Change only the table layout.A realistic setup would use one Spark-based fact table with columns such as event_ts, event_date, customer_id, region, event_type, order_status, and amount, partitioned by event_date. Then simulate frequent ingest into recent partitions so the table develops the same failure mode many production systems do: hot partitions filled with small files.Run the same query set across three versions of the table:Baseline: many small files, no layout maintenanceAfter compaction: fewer, better-sized filesAfter clustering: same data, reorganized around common filter pathsThe cleanest metrics are the ones operators already watch in production:● file count in hot partitions● average file size● planning time● total query runtime● files scanned● bytes read● maintenance job runtime or rewritten bytesThat gives you an apples-to-apples way to ask the right question: how much of the query bill is really a file-layout problem?Before tuning: what goes wrongBefore tuning, physical layout usually follows write cadence, not query shape.Data lands every few minutes. Recent partitions collect another pile of small Parquet files. Analysts filter by event_date, region, customer_id, or order_status, while the table is effectively organized by when each write arrived.Partition pruning still helps. It may eliminate older days quickly. But that only gets you down to the hot partitions, which are often the messiest part of the table. If those partitions still contain too many small files, the engine has too many candidates to inspect.That is why small-file tables often feel worse than their raw size suggests. A very large table can behave well if recent partitions are healthy. A much smaller table can feel slow if recent partitions are fragmented and badly laid out.After tuning: what changes with compaction, clustering, and pruningOnce you separate the mechanics, the roles of the three controls become clearer.Compaction reduces file count.This is the first fix because it attacks the per-file penalty directly. Delta’s OPTIMIZE can compact small files into larger ones, and Delta’s auto compaction can do that automatically after writes. Iceberg’s rewrite_data_files does the same class of work through bin-packing. In Hudi, small-file management is broader: write-time auto-sizing and clustering address file layout generally, while compaction in the Hudi-specific sense applies to Merge-on-Read tables and merges log files back into base files.Clustering improves locality.Compaction alone can still leave you with a table that is neat but not selective. Clustering reorganizes data so values that are commonly filtered together live closer together. Delta supports ZORDER, and newer Delta versions also support liquid clustering for incrementally clustering data over time. Iceberg exposes sort-based and zorder(...) layouts through rewrite_data_files. Hudi supports clustering inline or asynchronously, including background operation while ingestion continues.Pruning is where the engine collects the savings.Delta uses automatically collected data-skipping statistics such as min and max values. Iceberg uses hidden partition transforms and metadata-driven planning so queries do not have to know the table’s physical layout. Hudi’s metadata table exists in part to avoid expensive file listing and to expose metadata such as file listings and column statistics for planning. Better layout improves all three paths. The gains will vary by workload. Broad scans often benefit first from file-count reduction. More selective queries often benefit more when layout and statistics align with the columns people actually filter on.What this means in practiceThe operational lesson is not “run maintenance everywhere.” It is “run the right maintenance where the query bill is being generated.”A few rules hold up well in practice:● Measure hot partitions first. Whole-table size often hides where the pain actually lives.● Fix file count before chasing elaborate layout. If the table is badly fragmented, compaction or file sizing is usually the first lever.● Cluster around repeated predicates, not theoretical ones. Layout should follow the workload you really have.● Treat maintenance as a workload. Compaction, clustering, and rewrite jobs consume real compute and rewrite real bytes.One recurring mistake is trying to solve everything with partitioning alone. Delta’s clustering docs explicitly call out cases where a typical partition column would leave the table with too many or too few partitions. Iceberg’s hidden partitioning model exists in part to decouple query logic from rigid physical partition layout.That is the real trade-off: not maintenance versus no maintenance, but where you want the cost to land.Differences across Delta / Iceberg / HudiAll three open table formats help with the same broad problem, but they expose different control surfaces.Delta Lake exposes layout maintenance directly through OPTIMIZE, auto compaction, data skipping, and ZORDER. In newer Delta releases, liquid clustering adds an incremental clustering model for suitable tables, though it comes with its own feature and layout constraints.Apache Iceberg leans heavily on metadata-driven planning. Hidden partitioning, partition evolution, and metadata/manifests help the engine avoid work, while rewrite_data_files gives you bin-packing and sort-based rewrite paths, including zorder(...) support in Spark procedures.Apache Hudi attacks the problem from both sides: it avoids small files during writes where possible, offers clustering as a table service, uses a metadata table to reduce file-listing bottlenecks, and on Merge-on-Read tables uses compaction to merge log files into base files. That makes Hudi especially natural in write-heavy and CDC-style systems.Bottom lineA slow lakehouse is often a file-layout problem wearing a compute bill.Compaction reduces file count. Clustering improves locality. Pruning is where the engine realizes the savings. Put together, they do more than speed up queries. They make read cost more predictable, especially on the hot partitions where modern pipelines do most of their damage.That is why the small-file tax is such a useful way to frame the problem. It gives you a clean question: same data, same engine, before and after layout tuning, what changed in planning overhead, files scanned, and bytes read?If you are working through those trade-offs now, I go deeper on these patterns in Engineering Lakehouses with Open Table Formats.References● Chapter 8 of Engineering Lakehouses with Open Table Formats● Delta Lake Optimizations● Delta Lake Liquid Clustering● Apache Iceberg Partitioning and Hidden Partitioning● Apache Iceberg Spark Procedures (rewrite_data_files)● Apache Hudi Table Metadata● Apache Hudi Compaction● Apache Hudi File Sizing● Apache Hudi ClusteringAuthor BioVinoth Govindarajan is a seasoned data expert and staff software engineer at Apple Inc., where he spearheads data platforms using open-source technologies like Iceberg, Spark, Trino, and Flink. Before this, he worked on designing incremental ETL frameworks for real-time data processing at Uber. He is a dedicated contributor to the open source community in projects such as Apache Hudi and dbt-spark. As a thought leader, Vinoth has shared his expertise through speaking engagements at conferences such as dbt Coalesce and Hudi OSS community meetups. He has published several blogs on building open lakehouses. Holding a bachelor's degree in information technology, Vinoth has also authored multiple research papers published in journals like IEEE. --This text refers to an out of print or unavailable edition of this title.

0
0

article-image-when-data-is-not-enough-why-is-meaning-important-in-enterprise-ai

Keyla Dolores Méndez, Carla Vanesa Mamani Chávez

06 Mar 2026

5 min read

When data is not enough: Why is meaning important in enterprise AI?

Keyla Dolores Méndez, Carla Vanesa Mamani Chávez

06 Mar 2026

5 min read

2
1

article-image-from-c98-to-c23-the-arithmetic-mean-benchmarked-and-optimized

Ferenc Deak

27 Mar 2025

10 min read

From C++98 to C++23: The Arithmetic Mean, Benchmarked and Optimized

Ferenc Deak

27 Mar 2025

10 min read

Averaging algorithms are computational techniques used to determine the mean or average value of a set of numbers. They are widely used in machine learning (model optimization), signal processing (noise reduction), finance (trend analysis), and networking (traffic smoothing). Some common averaging algorithms include Arithmetic Mean (Simple Average), Moving Average (Sliding Window Average), Harmonic Mean, and Geometric Mean.In this article we will explore one of the most common averaging algorithms: the Arithmetic Mean, and its implementation in C++. We will explore how to compute the average of a vector using five different approaches, each corresponding to a major C++ standard, from C++98 to C++23. Additionally, we will benchmark these implementations to analyze their performance and efficiency. Finally, we will dig into the code generated by the two most used compilers on Linux platforms, GCC and Clang, and perform a thorough analysis and see which one comes out the winner.Readers of this article will gain a deeper understanding of Practical C++ implementation across different standards: you will see how to compute the arithmetic mean of a vector using various C++ techniques and understand the evolution of the language from C++98 to C++23. You will also gain insights into optimizing code for performance, while keeping Modern C++ coding practices in mind and learn how they can be used to write efficient and readable code.A Brief History of C++ StandardsC++ was first standardized in 1998 as C++98, which provided a stable foundation for the language. It was followed by C++03, a minor update that mainly focused on defect fixes, but without a significant change to the language mechanics or syntax. The major overhaul came with C++11, which introduced modern programming features such as auto, lambda expressions, smart pointers, and noteworthy syntax changes.C++14 refined these features and introduced new constructs to the Standard Template Library (STL), while C++17 improved performance with parallel algorithms and introduced std::optional together with new compile time expressions and removal of some older language syntax. The introduction of C++20 brought again revolutionary changes like concepts, ranges, and coroutines, enhancing the expressiveness of the language and on top of it introduced new syntax and operators too. Finally, C++23 builds upon these improvements with even more enhancements to ranges and better support for functional-style programming.The question naturally arises: why this quick detour through C++ history? Three main reasons drive this. First, we aim to emphasize the remarkable evolution of the language itself. This context is essential for understanding what we’ll demonstrate next: five distinct implementations of a basic algorithm. Second, we’re keen to explore the increasing expressiveness of C++ syntax across its versions. And finally, driven by curiosity, we’ll also touch upon the question of performance — does adopting a newer standard automatically lead to faster code?Computing the Average: Five ApproachesLet us now delve into five versions of a function that calculates the average of a vector of random integers. Each version utilizes a different C++ standard, demonstrating how modern C++ simplifies and makes common programming tasks more readable, but before introducing them, here is the code that generates the vector:1. C++98: Classic for-loop ApproachIn C++98, we relied on manual iteration using a basic for loop:This approach was straightforward but lacked modern conveniences like range-based loops or built-in algorithms.2. C++11: Range-based LoopWith C++11, we gained access to range-based loops, making iteration more concise and readable:This improved readability by removing explicit indexing and is one of the many usability improvements introduced in C++11.3. C++14: Using std::accumulateC++14 introduced functional-style algorithms such as std::accumulate, which simplified summing the elements:This approach is more declarative and eliminated the need for a loop. C++14 focused on minor refinements and usability improvements over C++11.4. C++17: Using std::reduceC++17 introduced std::reduce, which is optimized for parallel execution:std::reduce provides potential performance gains by allowing parallel execution when used with std::execution::par. C++17 marked a shift towards greater use of parallelism and optimization in the Standard Library.5. C++23: Using std::ranges::fold_leftWith C++23, we can leverage std::ranges::fold_left to compute the sum in a more functional programming style:This is the most modern and expressive approach, benefiting from the new std::ranges library. C++23 continues to improve upon the functional and declarative paradigms introduced in C++20.Benchmarking: Performance ComparisonTo analyze the performance of these implementations, we have developed a benchmarking function that executes each function 1000 times and calculates the average execution time in nanoseconds:The preceding function ensures more reliable results by mitigating fluctuations caused by system variations.ResultsWe tested these implementations of the randomly generated vector of 500000 elements, on a machine having a 13th Gen Intel i7–13700H (20) @ 4.800GHz CPU. Here’s a sample output using the latest version of the two major compilers as per February 2025.(However, please take this with a pinch of salt, like everything related to timed execution on a modern operating system, you know, context switches, multiple processes being needy, Linux juggling with these, etc…).Here are two example outputs:Compiled by:clang++ - 21.0.0Vector size: 500000 elementsC++98 Average: 50.458700 | Avg Time: 42083.258000 ns (over 1000 runs)C++11 Average: 50.458700 | Avg Time: 40607.151000 ns (over 1000 runs)C++14 Average: 50.458700 | Avg Time: 40443.039000 ns (over 1000 runs)C++17 Average: 50.458700 | Avg Time: 72351.570000 ns (over 1000 runs)C++23 Average: 50.458700 | Avg Time: 44660.055000 ns (over 1000 runs)=

0
0
54900

article-image-revolutionising-work-and-everyday-life-with-chatgpt

M.T. White

16 Dec 2024

10 min read

Revolutionising Work and Everyday Life with ChatGPT

M.T. White

16 Dec 2024

10 min read

IntroductionChatGPT is a revolutionary new technology that is making a large impact on society. The full impact of ChatGPT cannot be fully known at the time of writing this article because of how novel the technology is. However, what can be said is that since its introduction many industries have been trying to leverage it and increase productivity. Simultaneously, everyday people are trying to learn to leverage it as well. Overall, ChatGPT and similar systems are very new and the full impact of how to leverage them will take some more time to fully manifest. This article is going to explore how ChatGPT can be used for everyday life by exploring a few use cases.What is ChatGPT? Before we begin, it is important to understand what ChatGPT is and what it isn’t. To begin ChatGPT is in a lay sense a super advanced chatbot. More specifically, ChatGPT is known as a generative AI that uses Natural Language Processing (NLP) to create a dialog between a user and itself. ChatGPT and similar systems are what are known as Large Language Models (LLMs). In short, for AI models to work they have to be trained using data. To train LLMs engineers use vast amounts such as books, articles, journals, and so on. The result is a system like ChatGPT that has a vast knowledge base on many different subjects. Before we can explore how to use ChatGPT for everyday life we need to explore how NOT to use ChatGPT. How not to use ChatGPT?ChatGPT is very powerful and can be used for many different things; however, is important to understand that ChatGPT is not a sage nor infallible. Remember ChatGPT only knows about what it was trained on. This means if the information it was taught was wrong or outdated so too will be the response it provides. This means that as of writing this article ChatGPT cannot and should not be used as a replacement for humans. Even with the answer ChatGPT gives, a decent level of domain knowledge is still required properly format and use its response. As such, it is important to take suggestions with a certain amount of skepticism. The best way to think of ChatGPT is as an extremely smart friend with a wide range of knowledge. Though that friend may be very knowledgeable they can and will be wrong from time to time. With that how does one use ChatGPT?How to use ChatGPT for everyday life?Using ChatGPT starts with what is called a prompt. A prompt can be thought of as a query that the user feeds into ChatGPT. This prompt is kind of like a Google search. It will be a statement that asks ChatGPT a question or to perform a certain task. Since ChatGPT used NLP, the prompt does not have to be formatted a certain way nor does it require a specific syntax. In other words, a prompt can be entered the same way a user would pose the instruction to another person. However, it is good to provide ChatGPT will enough information to adequately describe the problem. An example, of a prompt is as follows:How can ChatGPT be used for everyday life?Once the prompt is run ChatGPT will produce text with suggestions, answers, programming software, or whatever was asked for. That output that is produced is called a response which will have the answer to the request/question. There are a lot of ways ChatGPT can be used for everyday life. In terms of professional life, ChatGPT can make an excellent assistant. For most “text-based” jobs, that is jobs that don’t require design such as drawing diagrams, ChatGPT can be an excellent assistant. For example, suppose a programmer needs a small piece of software for a project, assume they need a SQL query that can pull all the rows between two dates. The user could use a prompt like the following:Write a SQL query that will pull all rows between January 3 and March 31.This prompt will produce the following response:To pull all rows between January 3 and March 31 from a table, you'll need to use the WHERE clause to filter the results based on the date column. Assuming you have a date column named "date_column" in your table, here's the SQL query: sqlCopy code SELECT * FROM your_table WHERE date_column >= '2023-01-03' AND date_column <= '2023-03-31'; Replace "your_table" with the actual name of your table, and "date_column" with the name of the column that contains the dates you want to filter. The query will select all rows where the "date_column" is on or after January 3, 2023, and on or before March 31, 2023.For those that are not familiar with SQL, the prompt produced a simple query with generic placeholders that will pull the correct data out of a database. Not everyone is a programmer, so how could a non-programmer use ChatGPT as an assistant? One area in which ChatGPT can be used is by composing emails. Suppose a salesperson needs to inform Tom, their boss, that a $1 million deal was closed today via email. They could use a prompt like the following:Write Tom an email saying that the deal with Company X went through, and we made $1 million.The response would be something along the lines of:Subject: Successful Deal with Company X - $1 Million Revenue Achieved Hi Tom, I hope this email finds you well. I am thrilled to share some exciting news with you! The deal with Company X has successfully gone through, and I am delighted to inform you that we have made an astounding $1 million in revenue from this collaboration. This achievement wouldn't have been possible without the hard work and dedication of our entire team. It reflects the trust and confidence our partners have in our capabilities, and I believe it marks a significant milestone for our company's growth and success. I want to express my gratitude to you for your valuable contributions and expertise that played a crucial role in making this deal a reality. Your efforts have been commendable, and I am truly grateful for your commitment to the success of this venture. Let's take a moment to celebrate this triumph together. Your exceptional skills and commitment to excellence have made this accomplishment possible, and I am genuinely proud to have you as part of our team. Looking forward to even more prosperous ventures in the future. Once again, congratulations to you and the entire team! Best regards, [Your Name] [Your Title/Position] [Company Name] [Email] [Phone Number]In this case, all the users would have to do is proofread the email, copy it, and send it. Now, for applications such as this proofreading is very important. As stated before, ChatGPT is not a mind reader nor a sage, as such it may produce emails with wrong information, illogical wording, or anything else. So, it’ll pay to read the message before sending it. Learning ToolA very powerful use of ChatGPT is as a learning tool. For example, students can use this tool to help them master a new subject, generate practice problems, and even check answers to an extent. Assume the user needs a few extra practice problems for their English homework, they could enter a prompt like the following:Generate some grammar practice problems for a 1st year English classWhich would yield the following output:Of course! Here are some grammar practice problems suitable for a 1st-year English class: 1. Subject-Verb Agreement: Choose the correct verb form that agrees with the subject in each sentence: a) The cat (purr/purrs) when it is happy. b) My friend and I (is/are) going to the movies tonight. c) Neither the students nor the teacher (understands/understand) the assignment. 2. Pronoun Usage: Select the appropriate pronoun to complete each sentence: a) She invited Sarah and (I/me) to the party. b) My sister and (her/she) are going shopping. c) The book is for (he/him) and his brother. 3. Articles (a/an/the): Insert the correct article (a, an, the, or no article) in each sentence: a) Can you pass me (a/an) apple, please? b) I need to buy (the) new shoes for the dance. c) My friend has (no article) car. 4. Prepositions: Choose the correct preposition to complete each sentence: a) We arrived (in/on/at) the airport fifteen minutes early. b) The cat is sleeping (under/below/above) the table. c) She's going to the store (by/with/for) her mom. 5. Verb Tenses: Identify the correct verb tense used in each sentence: a) They (will be/was) studying for the exam all night. b) By the time we arrived, the party (had started/has started). c) I (am/was) watching a movie right now. 6. Adjective vs. Adverb: Decide whether the word in parentheses should be an adjective or adverb in the sentence: a) She sings (beautiful/beautifully). b) The (happy/happily) children played in the park. c) The (quick/quickly) fox jumped over the fence. 7. Negation: Add the necessary negative word to each sentence: a) I have (no/not) seen him today. b) She does (not/none) like coffee. c) We will (not/none) be attending the party. 8. Passive Voice: Rewrite the following sentences in the passive voice: a) The chef cooked the meal. b) The teacher is explaining the lesson. c) The storm destroyed the old building. 9. Comparative and Superlative Forms: Complete each sentence with the correct comparative or superlative form of the adjective: a) This book is (interesting/more interesting/most interesting) than the last one. b) She is (tall/taller/tallest) than her sister. c) It was the (big/biggest/bigger) cake at the party. 10. Reported Speech: Rewrite the following sentences using reported speech: a) Mary said, "I am going to the store." b) "I will call you later," said John. c) "They have arrived," she said. These grammar practice problems aim to reinforce essential grammar rules and concepts, helping 1st-year English students develop their language skills and proficiency.As can be seen, ChatGPT generated a number of questions that one would usually find in an English 101 class. Now, if the user wanted a more specific answer that could reword the prompt to accommodate. ChatGPT is also a great teaching tool as well. Teachers can also use this tool to generate lesson plans, write intros for learning modules, generate problems, and more. Overall, ChatGPT can be used as an excellent tool for academia in general.Now, it is important to remember that ChatGPT is not all-knowing. It can only generate responses based on the data that it was trained on. This means if the user is asking questions about a subject where there is little or no publicly available data the response probably isn’t going to be of the highest quality. For example, if someone askedHow do I build a working time machine?They’re going to get the following:Building a working time machine, as depicted in science fiction, is currently beyond our technological capabilities and goes against the known laws of physics. Time travel, as popularly imagined, involves traveling backward or forward in time, similar to how we move through space. As of my last update in September 2021, time travel remains a concept explored mainly in theoretical physics, and no practical time machine has been built or proven to be possible. Several theories, such as the theory of relativity, suggest that time travel could be theoretically feasible under certain conditions, but these ideas are still purely theoretical and have not been put into practice. It's important to remember that building a time machine would present enormous scientific, technological, and ethical challenges. It could raise paradoxes, such as the "grandfather paradox," where a time traveler could potentially alter past events and create inconsistencies in history. As of now, time travel remains a fascinating topic in science fiction and theoretical physics, but it is not something that can be practically accomplished with our current understanding and technology. The scientific community continues to explore the nature of time and the universe, but building a working time machine as portrayed in movies and books is not yet a reality.ChatGPT gave some general information on time travel but unfortunately, it didn’t generate any schematics. Again, this goes back to ChatGPT not being a sage as it only knows what it was “taught”. Since there is no data on how to build a working time machine ChatGPT could not fully answer the prompt. Now, If one looks at the English question prompt above, one will notice that practice questions were quite broad. It is common to have to “dig” with ChatGPT. In other words, a person may have to refine the queriers to get what they need. To get some practice try re-wording the prompt to generate practice questions for only passive writing. SummaryChatGPT is a tool, and like any other tool, what it’s used for is up to the user. As was seen in this article, ChatGPT is an excellent tool for helping a person through their day by generating software, emails, and so on. ChatGPT can also be used as a great learning or teaching device to help students and teachers generate practice problems, create lesson plans, and so much more. However, as was stated so many numerous times. Unless ChatGPT has been trained on something it does not know about it. This means that asking it things like how to build a time machine or domain specific concepts aren’t going to return quality responses. Also, even if ChatGPT has been trained on the prompt, it may not always generate a quality response. No matter the use case, the response should be vetted for accuracy. This may mean doing a little extra research with the response given, testing the output, or whatever needs to be done to verify the response. Overall, ChatGPT at the time of writing this article is less than a year old. This means that the full implication of using ChatGPT are not fully understood. Also, how to fully leverage ChatGPT is not understood yet either. What can be said is that ChatGPT and similar LLM systems will probably be the next Google. In terms of everyday use, the only true inhibitors are the user's imagination and the data that was used to train ChatGPT.Author BioM.T. White has been programming since the age of 12. His fascination with robotics flourished when he was a child programming microcontrollers such as Arduino. M.T. currently holds an undergraduate degree in mathematics, and a master's degree in software engineering, and is currently working on an MBA in IT project management. M.T. is currently working as a software developer for a major US defense contractor and is an adjunct CIS instructor at ECPI University. His background mostly stems from the automation industry where he programmed PLCs and HMIs for many different types of applications. M.T. has programmed many different brands of PLCs over the years and has developed HMIs using many different tools.Author of the book: Mastering PLC Programming

0
0
66267

article-image-enhancing-data-quality-with-cleanlab

Prakhar Mishra

11 Dec 2024

10 min read

Enhancing Data Quality with Cleanlab

Prakhar Mishra

11 Dec 2024

10 min read

IntroductionIt is a well-established fact that your machine-learning model is only as good as the data it is fed. ML model trained on bad-quality data usually has a number of issues. Here are a few ways that bad data might affect machine-learning models -1. Predictions that are wrong may be made as a result of errors, missing numbers, or other irregularities in low-quality data. The model's predictions are likely to be inaccurate if the data used to train is unreliable.2. Bad data can also bias the model. The ML model can learn and reinforce these biases if the data is not representative of the real-world situations, which can result in predictions that are discriminating.3. Poor data also disables the the ability of ML model to generalize on fresh data. Poor data may not effectively depict the underlying patterns and relationships in the data.4. Models trained on bad-quality data might need more retraining and maintenance. The overall cost and complexity of model deployment could rise as a result.As a result, it is critical to devote time and effort to data preprocessing and cleaning in order to decrease the impact of bad data on ML models. Furthermore, to ensure the model's dependability and performance, it is often necessary to use domain knowledge to recognize and address data quality issues.It might come as a surprise, but gold-standard datasets like ImageNet, CIFAR, MNIST, 20News, and more also contain labeling issues. I have put in some examples below for reference -The above snippet is from the Amazon sentiment review dataset , where the original label was Neutral in both cases, whereas Cleanlab and Mechanical turk said it to be positive (which is correct).The above snippet is from the MNIST dataset, where the original label was marked to be 8 and 0 respectively, which is incorrect. Instead, both Cleanlab and Mechanical Turk said it to be 9 and 6 (which is correct).Feel free to check out labelerrors to explore more such cases in similar datasets.Introducing CleanlabThis is where Cleanlab can come in handy as your best bet. It helps by automatically identifying problems in your ML dataset, it assists you in cleaning both data and labels. This data centric AI software uses your existing models to estimate dataset problems that can be fixed to train even better models. The graphic below depicts the typical data-centric AI model development cycle:Apart from the standard way of coding all the way through finding data issues, it also offers Cleanlab Studio - a no-code platform for fixing all your data errors. For the purpose of this blog, we will go the former way on our sample use case.Getting Hands-on with CleanlabInstallationInstalling cleanlab is as easy as doing a pip install. I recommend installing optional dependencies as well, you never know what you need and when. I also installed sentence transformers, as I would be using them for vectorizing the text. Sentence transformers come with a bag of many amazing models, we particularly use ‘all-mpnet-base-v2’ as our choice of sentence-transformers for vectorizing text sequences. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for tasks like clustering or semantic search. Feel free to check out this for the list of all models and their comparisons.pip install ‘cleanlab[all]’ pip install sentence-transformersDatasetWe picked the SMS Spam Detection dataset as our choice of dataset for doing the experimentation. It is a public set of labeled SMS messages that have been collected for mobile phone spam research with total instances of roughly ~5.5k. The below graphic gives a sneak peek of some of the samples from the dataset.Data PreviewCodeLet’s now delve into the code. For demonstration purposes, we inject a 5% noise in the dataset, and see if we are able to detect them and eventually train a better model.Note: I have also annotated every segment of the code wherever necessary for better understanding.import pandas as pd from sklearn.model_selection import train_test_split, cross_val_predict from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import LogisticRegression from sentence_transformers import SentenceTransformer from cleanlab.classification import CleanLearning from sklearn.metrics import f1_score # Reading and renaming data. Here we set sep=’\t’ because the data is tab separated. data = pd.read_csv('SMSSpamCollection', sep='\t') data.rename({0:'label', 1:'text'}, inplace=True, axis=1) # Dropping any instance of duplicates that could exist data.drop_duplicates(subset=['text'], keep=False, inplace=True) # Original data distribution for spam and not spam (ham) categories print (data['label'].value_counts(normalize=True)) ham 0.865937 spam 0.134063 # Adding noise. Switching 5% of ham data to ‘spam’ label tmp_df = data[data['label']=='ham'] examples_to_change = int(tmp_df.shape[0]*0.05) print (f'Changing examples: {examples_to_change}') examples_text_to_change = tmp_df.head(examples_to_change)['text'].tolist() changed_df = pd.DataFrame([[i, 'spam'] for i in examples_text_to_change]) changed_df.rename({0:'text', 1:'label'}, axis=1, inplace=True) left_data = data[~data['text'].isin(examples_text_to_change)] final_df = pd.concat([left_data, changed_df]) final_df.reset_index(drop=True, inplace=True) Changing examples: 216 # Modified data distribution for spam and not spam (ham) categories print (final_df['label'].value_counts(normalize=True)) ham 0.840016 spam 0.159984 raw_texts, raw_labels = final_df["text"].values, final_df["label"].values # Converting label into integers encoder = LabelEncoder() encoder.fit(raw_train_labels) train_labels = encoder.transform(raw_train_labels) test_labels = encoder.transform(raw_test_labels) # Vectorizing text sequence using sentence-transformers transformer = SentenceTransformer('all-mpnet-base-v2') train_texts = transformer.encode(raw_train_texts) test_texts = transformer.encode(raw_test_texts) # Instatiating model instance model = LogisticRegression(max_iter=200) # Wrapping the sckit model around CL cl = CleanLearning(model) # Finding label issues in the train set label_issues = cl.find_label_issues(X=train_texts, labels=train_labels) # Picking top 50 samples based on confidence scores identified_issues = label_issues[label_issues["is_label_issue"] == True] lowest_quality_labels = label_issues["label_quality"].argsort()[:50].to_numpy() # Beauty print the label issue detected by CleanLab def print_as_df(index): return pd.DataFrame( { "text": raw_train_texts, "given_label": raw_train_labels, "predicted_label": encoder.inverse_transform(label_issues["predicted_label"]), }, ).iloc[index] print_as_df(lowest_quality_labels[:5]) As we can see, Cleanlab assisted us in automatically removing the incorrect labels and training a better model with the same parameters and settings. In my experience, people frequently ignore data concerns in favor of building more sophisticated models to increase accuracy numbers. Improving data, on the other hand, is a pretty simple performance win. And, thanks to products like Cleanlab, it's become really simple and convenient.Feel free to access and play around with the above code in the Colab notebook hereConclusionIn conclusion, Cleanlab offers a straightforward solution to enhance data quality by addressing label inconsistencies, a crucial step in building more reliable and accurate machine learning models. By focusing on data integrity, Cleanlab simplifies the path to better performance and underscores the significance of clean data in the ever-evolving landscape of AI. Elevate your model's accuracy by investing in data quality, and explore the provided code to see the impact for yourself.Author BioPrakhar has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn

2
0
41913

article-image-unlocking-insights-how-power-bi-empowers-analytics-for-all-users

Gogula Aryalingam

29 Nov 2024

5 min read

Unlocking Insights: How Power BI Empowers Analytics for All Users

Gogula Aryalingam

29 Nov 2024

5 min read

IntroductionIn today’s data-driven world, businesses rely heavily on robust tools to transform raw data into actionable insights. Among these tools, Microsoft Power BI stands out as a leader, renowned for its versatility and user-friendliness. From its humble beginnings as an Excel add-in, Power BI has evolved into a comprehensive enterprise business intelligence platform, competing with industry giants like Tableau and Qlik. This journey of transformation reflects not only Microsoft’s innovation but also the growing need for accessible, scalable analytics solutions.As a data specialist who has transitioned from traditional data warehousing to modern analytics platforms, I’ve witnessed firsthand how Power BI empowers both technical and non-technical users. It has become an indispensable tool, offering capabilities that bridge the gap between data modeling and visualization, catering to everyone from citizen developers to seasoned data analysts. This article explores the evolution of Power BI, its role in democratizing data analytics, and its integration into broader solutions like Microsoft Fabric, highlighting why mastering Power BI is critical for anyone pursuing a career in analytics.The Changing Tide for Data Analysts When you think of business intelligence in the modern era, Power BI is often the first tool that comes to mind. However, this wasn't always the case. Originally launched as an add-in for Microsoft Excel, Power BI quickly evolved into a comprehensive enterprise business intelligence platform in a few years competing with the likes of Qlik and Tableau—a true testament to its capabilities. As a data specialist, what really impresses me about Power BI's evolution is how Microsoft has continuously improved its user-friendliness, making both data modeling and visualizing more accessible, catering to both technical professionals and business users. As a data specialist, initially working with traditional data warehousing, and now with modern data lakehouse-based analytics platforms, I’ve come to appreciate the capabilities that Power BI brings to the table. It empowers me to go beyond the basics, allowing me to develop detailed semantic layers and create impactful visualizations that turn raw data into actionable insights. This capability is crucial in delivering truly comprehensive, end-to-end analytics solutions. For technical folk like me, by building on our experiences working with these architectures and the deep understanding of the technologies and concepts that drive them, integrating Power BI into the workflow is a smooth and intuitive process. The transition to including Power BI in my solutions feels almost like a natural progression, as it seamlessly complements and enhances the existing frameworks I work with. It's become an indispensable tool in my data toolkit, helping me to push the boundaries of what's possible in analytics. In recent years, there has been a noticeable increase in the number of citizen developers and citizen data scientists. These are non-technical professionals who are well-versed in their business domains and dabble with technology to create their own solutions. This trend has driven the development of a range of low-code/no-code, visual tools such as Coda, Appian, OutSystems, Shopify, and Microsoft’s Power Platform. At the same time, the role of the data analyst has significantly expanded. More organizations are now entrusting data analysts with responsibilities that were traditionally handled by technology or IT departments. These include tasks like reporting, generating insights, data governance, and even managing the organization’s entire analytics function. This shift reflects the growing importance of data analytics in driving business decisions and operations. As a data specialist, I’ve been particularly impressed by how Power BI has evolved in terms of user-friendliness, catering not just to tech-savvy professionals but also to business users. Microsoft has continuously refined Power BI, simplifying complex tasks and making it easy for users of all skill levels to connect, model, and visualize data. This focus on usability is what makes Power BI such a powerful tool, accessible to a wide range of users. For non-technical users, Power BI offers a short learning curve, enabling them to connect to and model data for reporting without needing to rely on Excel, which they might be more familiar with. Once the data is modeled, they can explore a variety of visualization options to derive insights. Moreover, Power BI’s capabilities extend beyond simple reporting, allowing users to scale their work into a full-fledged enterprise business intelligence system. Many data analysts are now looking to deepen their understanding of the broader solutions and technologies that support their work. This is where Microsoft Fabric becomes essential. Fabric extends Power BI by transforming it into a comprehensive, end-to-end analytics platform, incorporating data lakes, data warehouses, data marts, data engineering, data science, and more. With these advanced capabilities, technical work becomes significantly easier, enabling data analysts to take their skills to the next level and realize their full potential in driving analytics solutions. If you're considering a career in analytics and business intelligence, it's crucial to master the fundamentals and gain a comprehensive understanding of the necessary skills. With the field rapidly evolving, staying ahead means equipping yourself with the right knowledge to confidently join this dynamic industry. The Complete Power BI Interview Guide is designed to guide you through this process, providing the essential insights and tools you need to jump on board and thrive in your analytics journey. ConclusionConclusionMicrosoft Power BI has redefined the analytics landscape by making advanced business intelligence capabilities accessible to a wide audience, from technical professionals to business users. Its seamless integration into modern analytics workflows and its ability to support end-to-end solutions make it an invaluable tool in today’s data-centric environment. With the rise of citizen developers and expanded responsibilities for data analysts, tools like Power BI and platforms like Microsoft Fabric are paving the way for more innovative and comprehensive analytics solutions.For aspiring professionals, understanding the fundamentals of Power BI and its ecosystem is key to thriving in the analytics field. If you're looking to master Power BI and gain the confidence to excel in interviews and real-world scenarios, The Complete Power BI Interview Guide is an invaluable resource. From the core PowerBI concepts to interview preparation and onboarding tips and tricks, The Complete Power BI Interview Guide is the ultimate resource for beginners and aspiring Power BI job seekers who want to stand out from the competition.Author BioGogula is an analytics and BI architect born and raised in Sri Lanka. His childhood was spent dreaming, while most of his adulthood was and is spent working with technology. He currently works for a technology and services company based out of Colombo. He has accumulated close to 20 years of experience working with a diverse range of customers across various domains, including insurance, healthcare, logistics, manufacturing, fashion, F&B, K-12, and tertiary education. Throughout his career, he has undertaken multiple roles, including managing delivery, architecting, designing, and developing data & AI solutions. Gogula is a recipient of the Microsoft MVP award more than 15 times, has contributed to the development and standardization of Microsoft certifications, and holds over 15 data & AI certifications. In his leisure time, he enjoys experimenting with and writing about technology, as well as organizing and speaking at technology meetups.

0
0
25827

article-image-mastering-midjourney-ai-world-for-design-success

Margarida Barreto

21 Nov 2024

15 min read

Mastering Midjourney AI World for Design Success

Margarida Barreto

21 Nov 2024

15 min read

IntroductionIn today’s rapidly shifting world of design and trends, artificial intelligence (AI) has become a reality! It’s now a creative partner that helps designers and creative minds go further and stand out from the competition. One of the leading AI tools revolutionizing the design process is Midjourney. Whether you’re an experienced professional or a curious beginner, mastering this tool can enhance your creative workflow and open up new possibilities for branding, advertising, and personal projects. In this article, we’ll explore how AI can act as a brainstorming partner, help overcome creative blocks, and provide insights into best practices for unlocking its full potential. Using AI as my creative colleague AI tools like Midjourney have the potential to become more than just assistants; they can function as creative collaborators. Often, as designers, we hit roadblocks—times when ideas run dry, or creative fatigue sets in. This is where Midjourney steps in, acting as a colleague who is always available for brainstorming. By generating multiple variations of an idea, it can inspire new directions or unlock solutions that may not have been immediately apparent. The beauty of AI lies in its ability to combine data insights with creative freedom. Midjourney, for instance, uses text prompts to generate visuals that help spark creativity. Whether you’re building moodboards, conceptualizing ad campaigns, or creating a specific portfolio of images, the tool’s vast generative capabilities enable you to break free from mental blocks and jumpstart new ideas. Best practices and trends in AI for creative workflows While AI offers incredible creative opportunities, mastering tools like Midjourney requires understanding its potential and limits. A key practice for success with AI is knowing how to use prompts effectively. Midjourney allows users to guide the AI with text descriptions or just image input, and the more you fine-tune those prompts, the closer the output aligns with your vision. Understanding the nuances of these prompts—from image weights to blending modes—enables you to achieve optimal results. A significant trend in AI design is the combination of multiple tools. MidJourney is powerful, but it’s not a one-stop solution. The best results often come from integrating other third-party tools like Kling.ai or Gen 3 Runway. These complementary tools help refine the output, bringing it to a professional level. For instance, Midjourney might generate the base image, but tools like Kling.ai could animate that image, creating dynamic visuals perfect for social media or advertising. Additionally, staying up to date with AI updates and model improvements is crucial. Midjourney regularly releases new versions that bring refined features and enhancements. Learning how these updates impact your workflow is a valuable skill, as mastering earlier versions helps build a deeper understanding of the tool’s evolution and future potential. The book, The Midjourney Expedition, dives into these aspects, offering both beginners and advanced users a guide to mastering each version of the tool. Overcoming creative blocks and boosting productivity One of the most exciting aspects of using AI in design is its ability to alleviate creative fatigue. When you’ve been working on a project for hours or days, it’s easy to feel stuck. Here’s an example of how AI helped me when I needed to create a mockup for a client’s campaign. I wasn’t finding suitable mockups on regular stock photo sites, so I decided to create my own. I went to the MidJourney website: www.midjourney.com Logged in using my Discord or Google account. Go to Create (step 1 in the image below), enter the prompt (3D rendering of a blank vertical lightbox in front of a wall of a modern building. Outdoor advertising mockup template, front view) in the text box ( step 2), click on the icon on the right (step 3) to open the settings box (step 4) change any settings you want. In this case, lets keep it with the default settings, I just adjusted the settings to make the image landscape-oriented and pressed enter on my keyboard. 4 images will appear, choose the one you like the most or rerun the job, until you fell happy with the result. I got my image, but now I need to add the advertisement I had previously generated on Midjourney, so I can present to my client some ideas for the final mockup. Lets click on the image to enlarge it and get more options. On the bottom of the page lets click on Editor In Editor mode and with the erase tool selected, erase the inside of the billboard frame, next copy the URL of the image you want to use as a reference to be inserted in the billboard, and edit your prompt to: https://cdn.midjourney.com/urloftheimage.png 3D rendering of a, Fashion cover of "VOGUE" magazine, a beautiful girl in a yellow coat and sunglasses against a blue background inside the frame, vertical digital billboard mockup in front of a modern building with a white wall at night. Glowing light inside the frame., in high resolution and high quality. And press Submit. This is the final result. In case you master any editing tool, you can skip this last step and personalize the mockup, for instance, in Photoshop. This is just one example of how AI saved me time and allowed me to create a custom mockup for my client. For many designers, MidJourney serves as another creative tool, always fresh with new perspectives, and helping unlock ideas we hadn’t considered. Moreover, AI can save hours of work. It allows designers to skip repetitive tasks, such as creating multiple iterations of mockups or ad layouts. By automating these processes, creatives can focus on refining their work and ensuring that the main visual content serves a purpose beyond aesthetics. The challenges of writing about a rapidly evolving tool Writing The Midjourney Expedition was a unique challenge because I was documenting a technology that evolves daily. AI design tools like Midjourney are constantly being updated, with new versions offering improved features and refined models. As I wrote the book, I found myself not only learning about the tool but also integrating the latest advancements as they occurred. One of the most interesting parts was revisiting the older versions of MidJourney. These models, once groundbreaking, now seem like relics, yet they offer valuable insights into how far the technology has come. Writing about these early versions gave me a sense of nostalgia, but it also highlighted the rapid progress in AI. The same principles that amazed us two years ago have been drastically improved, allowing us to create more accurate and visually stunning images. The book is not just about creating beautiful images, it’s about practical applications. As a communication designer, I’ve always focused on using AI to solve real-world problems, whether for branding, advertising, or storytelling. And I find Midjourney to be a powerful solution for any creative who need to go one step further in a effective way. Conclusion AI is not the future of design, it’s already here! While I don’t believe AI will replace creatives, any creator who masters these tools may replace those who don’t use them. Tools like Midjourney are transforming how we approach creative workflows and even final outcomes, enabling designers to collaborate with AI, overcome creative blocks, and produce better results faster. Whether you're new to AI or an experienced user, mastering these tools can unlock new opportunities for both personal and professional projects. By combining Midjourney with other creative tools, you can push your designs further, ensuring that AI serves as a valuable resource for your creative tasks. Unlock the full potential of AI in your creative workflows with "The Midjourney Expedition". This book is for creative professionals looking to leverage Midjourney. You’ll learn how to produce stunning AI art, streamline your creative process, and incorporate AI into your work, all while gaining a competitive edge in your industry.Author BioMargarida Barreto is a seasoned communication designer with over 20 years of experience in the industry. As the author of The Midjourney Expedition, she empowers creatives to explore the full potential of AI in their workflows. Margarida specializes in integrating AI tools like Midjourney into branding, advertising, and design, helping professionals overcome creative challenges and achieve outstanding results.

0
0
31790

article-image-artificial-intelligence-in-game-development-understanding-behavior-trees

Marco Secchi

18 Nov 2024

10 min read

Artificial Intelligence in Game Development: Understanding Behavior Trees

Marco Secchi

18 Nov 2024

10 min read

IntroductionIn the wild world of videogames, you'll inevitably encounter a foe that needs to be both engaging and captivating. This opponent isn't just a bunch of nice-to-see polygons and textures; it needs to be a challenge that'll keep your players hooked to the screen.Let's be honest, as a game developer, crafting a truly engaging opponent is often a challenge that rivals the one your players will face!In video games, we often use the term Artificial Intelligence (AI) to describe characters that are not controlled by the player, whether they are enemies or friendly entities. There are countless ways to develop compelling characters in video games. In this article, we'll explore one specific solution offered by Unreal Engine: behavior trees.NoteCitations come from my Artificial Intelligence in Unreal Engine 5 book.Using the Unreal Shooting Gym ProjectFor this article, I have created a dedicated project called Unreal Shooting Gym. You can freely download it from GitHub: https://github.com/marcosecchi/unreal-shooting-gym and open it up with Unreal Engine 5.4.Once opened, you should see a level showing a lab with a set of targets and a small robot armed with a gun (A.K.A. RoboGun), as shown in Figure 1: Figure 1. The project level.If you hit the Play button, you should notice the RoboGun rotating toward a target while shooting. Once the target has been hit, the RoboGun will start rotating towards another one. All this logic has been achieved through a behavior tree, so let’s see what it is all about.Behavior Trees“In the universe of game development, behavior trees are hierarchical structures that govern the decision-making processes of AI characters, determining their actions and reactions during gameplay.”Unreal Engine offers a solid framework for handling behavior trees based on two main elements: the blackboard and behavior tree assets.Blackboard Asset“In Unreal Engine, the Blackboard [...] acts as a memory space – some sort of brain – where AI agents can read and write data during their decision-making process.“By opening the AI project folder, you can double-click the BB_Robogun asset to open it. You will be presented with the blackboard that, as you can see from Figure 2, is quite simple to understand. Figure 2. The AI blackboardAs you can see there’s a couple of variables – called keys – that are used to store a reference to the actor owning the behavior tree – in this case, the RoboGun – and to the target object that will be used to rotate the RoboGun.Behavior Tree Asset“In Unreal Engine, behavior trees are assets that are edited in a similar way to Blueprints – that is, visually – by adding and linking a set of nodes with specific functionalities to form a behavior tree graph.”Now, double-click the BT_RoboGun asset located in the AI folder to open the behavior tree. You should see the tree structure depicted in Figure 3:Figure 3. The AI behavior treeAlthough this is a pretty simple behavior logic, there’s a lot of things involved here. First of all, you will notice that there is a Root node; this is where the behavior logic starts from.After that, you will see that there are three gray-colored nodes; these are defined composite nodes.“Composite nodes define the root of a branch and set the rules for its execution.”Each of them behaves differently, but it is sufficient to say that they control the subtree that will be executed; as an example, the Shoot Sequence node will execute all the subtrees one after the other.The purple-colored nodes are called tasks and they are basically the leaves of the tree, whose aim is to execute actions. Unreal Engine comes with some predefined tasks, but you will be able to create your own through Blueprints or C++.As an example, consider the Shoot task depicted in Figure 4: Figure 4. The Shoot task In this Blueprint, when the task is executed, it will call the Shoot method – by means of a ShootInterface – and then end the execution with success. For a slightly more complex task, please check the BTTask_SeekTarget asset.Get back to the behavior tree, and you will notice that the Find Random Target node has a blue-colored section called Is Target Set? This is a decorator. “Decorators provide a way to add additional functionality or conditions to the execution of a portion of a behavior tree.”In our case, the decorator is checking if the TargetActor blackboard key is not set; the corresponding task will be executed only if that key is not set – that is, we have no viable target. If the target is set, this decorator will block task execution and the parent selector node – the Root Selector node – will execute the next subtree.Environment QueriesUnreal Engine provides an Environment Query System (EQS) framework that allows data collection about the virtual environment. AI agents will be able to make informed decisions based on the results.In our behavior tree, we are running an environment query to find a viable target in the Find Random Target task. The query I have created – called EQ_FindTarget – is pretty simple as it just queries the environment looking for instances of the class BP_Target, as shown in Figure 5:Figure 5. The environment queryPawn and ControllerOnce you have created your behavior tree, you will need to execute it through an AIController, the class that is used to possess pawns or characters in order to make them proper AI agents. In the Blueprints folder, you can double-click on the RoboGunController asset to check the pretty self-explanatory code depicted in Figure 6:Figure 6. The character controller codeAs you can see, it’s just a matter of running a behavior tree asset. Easy, isn’t it?If you open the BP_RoboGun asset, you will notice that, in the Details panel, I have set the AI Controller Class to the RoboGunController; this will make the RoboGun pawn be automatically possessed by the RoboGunController.ConclusionThis concludes this brief overview of the behavior tree system; I encourage you to explore the possibilities and more advanced features – such as writing your code the C++ way – by reading my new book “Artificial Intelligence in Unreal Engine 5”; I promise you it will be an informative and, sometimes, funny journey!Author BioMarco Secchi is a freelance game programmer who graduated in Computer Engineering at the Polytechnic University of Milan. He is currently lecturer of the BA in Creative Technologies and of the MA in Creative Media Production. He also mentors BA students in their final thesis projects. In his spare time, he reads (a lot), plays (less than he would like) and practices (to some extent) Crossfit.

0
0
60598

article-image-the-complete-guide-to-nlp-foundations-techniques-and-large-language-models

Lior Gazit, Meysam Ghaffari

13 Nov 2024

10 min read

The Complete Guide to NLP: Foundations, Techniques, and Large Language Models

Lior Gazit, Meysam Ghaffari

13 Nov 2024

10 min read

0
0
66078

article-image-simplifying-ai-pipelines-using-the-fti-architecture

Paul Iusztin

08 Nov 2024

15 min read

Simplifying AI pipelines using the FTI Architecture

Paul Iusztin

08 Nov 2024

15 min read

IntroductionNavigating the world of data and AI systems can be overwhelming.Their complexity often makes it difficult to visualize how data engineering, research (data science and machine learning), and production roles (AI engineering, ML engineering, MLOps) work together to form an end-to-end system.As a data engineer, your work finishes when standardized data is ingested into a data warehouse or lake.As a researcher, your work ends after training the optimal model on a static dataset and registering it.As an AI or ML engineer, deploying the model into production often signals the end of your responsibilities.As an MLOps engineer, your work finishes when operations are fully automated and adequately monitored for long-term stability.But is there a more intuitive and accessible way to comprehend the entire end-to-end data and AI ecosystem?Absolutely—through the FTI architecture.Let’s quickly dig into the FTI architecture and apply it to a production LLM & RAG use case. Figure 1: The mess of bringing structure between the common elements of an ML system.Introducing the FTI architectureThe FTI architecture proposes a clear and straightforward mind map that any team or person can follow to compute the features, train the model, and deploy an inference pipeline to make predictions.The pattern suggests that any ML system can be boiled down to these 3 pipelines: feature, training, and inference.This is powerful, as we can clearly define the scope and interface of each pipeline. Ultimately, we have just 3 instead of 20 moving pieces, as suggested in Figure 1, which is much easier to work with and define.Figure 2 shows the feature, training, and inference pipelines. We will zoom in on each one to understand its scope and interface.Figure 2: FTI architectureBefore going into the details, it is essential to understand that each pipeline is a separate component that can run on different processes or hardware. Thus, each pipeline can be written using a different technology, by a different team, or scaled differently.The feature pipelineThe feature pipeline takes raw data as input, processes it, and outputs the features and labels required by the model for training or inference.Instead of directly passing them to the model, the features and labels are stored inside a feature store. Its responsibility is to store, version, track, and share the features.By saving the features into a feature store, we always have a state of our features. Thus, we can easily send the features to the training and inference pipelines.The training pipelineThe training pipeline takes the features and labels from the features stored as input and outputs a trained model(s).The models are stored in a model registry. Its role is similar to that of feature stores, but the model is the first-class citizen this time. Thus, the model registry will store, version, track, and share the model with the inference pipeline.The inference pipelineThe inference pipeline takes as input the features and labels from the feature store and the trained model from the model registry. With these two, predictions can be easily made in either batch or real-time mode.As this is a versatile pattern, it is up to you to decide what you do with your predictions. If it’s a batch system, they will probably be stored in a DB. If it’s a real-time system, the predictions will be served to the client who requested them.The most important thing you must remember about the FTI pipelines is their interface. It doesn’t matter how complex your ML system gets — these interfaces will remain the same.The final thing you must understand about the FTI pattern is that the system doesn’t have to contain only 3 pipelines. In most cases, it will include more.For example, the feature pipeline can be composed of a service that computes the features and one that validates the data. Also, the training pipeline can comprise the training and evaluation components.Applying the FTI architecture to a use caseThe FTI architecture is tool-agnostic, but to better understand how it works, let’s present a concrete use case and tech stack.Use case: Fine-tune an LLM on your social media data (LinkedIn, Medium, GitHub) and expose it as a real-time RAG application. Let’s call it your LLM Twin.As we build an end-to-end system, we split it into 4 pipelines:The data collection pipeline (owned by the DE team)The FTI pipelines (owned by the AI teams)As the FTI architecture defines a straightforward interface, we can easily connect the data collection pipeline to the ML components through a data warehouse, which, in our case, is a MongoDB NoSQL DB.The feature pipeline (the second ML-oriented data pipeline) can easily extract standardized data from the data warehouse and preprocess it for fine-tuning and RAG.The communication between the two is done solely through the data warehouse. Thus, the feature pipeline isn’t aware of the data collection pipeline and how it collected the raw data. Figure 3: LLM Twin high-level architectureThe feature pipeline does two things:chunks, embeds and loads the data to a Qdrant vector DB for RAG;generates an instruct dataset and loads it into a versioned ZenML artifact.The training pipeline ingests a specific version of the instruct dataset, fine-tunes an open-source LLM from HuggingFace, such as Llama 3.1, and pushes it to a HuggingFace model registry.During the research phase, we use a Comet ML experiment tracker to compare multiple fine-tuning experiments and push only the best one to the model registry.During production, we can automate the training job and use our LLM evaluation strategy or canary tests to check if the new LLM is fit for production.As the input dataset and output model registry are decoupled, we can quickly launch our training jobs using ML platforms like AWS SageMaker.ZenML orchestrates the data collection, feature, and training pipelines. Thus, we can easily schedule them or run them on demand orThe end-to-end RAG application is implemented in the inference pipeline side, which accesses fresh documents from the Qdrant vector DB and the latest model from the HuggingFace model registry.Here, we can implement advanced RAG techniques such as query expansion, self-query and rerank to improve the accuracy of our retrieval step for better context during the generation step.The fine-tuned LLM will be deployed to AWS SageMaker as an inference endpoint. Meanwhile, the rest of the RAG application is hosted as a FastAPI server, exposing the end-to-end logic as REST API endpoints.The last step is to collect the input prompts and generated answers with a prompt monitoring tool such as Opik to evaluate the production LLM for things such as hallucinations, moderation or domain-specific problems such as writing tone and style.SummaryThe FTI architecture is a powerful mindmap that helps you connect the dots in the complex data and AI world, as illustrated in the LLM Twin use case.Unlock the full potential of Large Language Models with the "LLM Engineer's Handbook" by Paul Iusztin and Maxime Labonne. Dive deeper into real-world applications, like the FTI architecture, and learn how to seamlessly connect data engineering, ML pipelines, and AI production. With practical insights and step-by-step guidance, this handbook is an essential resource for anyone looking to master end-to-end AI systems. Don’t just read about AI—start building it. Get your copy today and transform how you approach LLM engineering!Author BioPaul Iusztin is a senior ML and MLOps engineer at Metaphysic, a leading GenAI platform, serving as one of their core engineers in taking their deep learning products to production. Along with Metaphysic, with over seven years of experience, he built GenAI, Computer Vision and MLOps solutions for CoreAI, Everseen, and Continental. Paul's determined passion and mission are to build data-intensive AI/ML products that serve the world and educate others about the process. As the Founder of Decoding ML, a channel for battle-tested content on learning how to design, code, and deploy production-grade ML, Paul has significantly enriched the engineering and MLOps community. His weekly content on ML engineering and his open-source courses focusing on end-to-end ML life cycles, such as Hands-on LLMs and LLM Twin, testify to his valuable contributions.

0
0
27367

article-image-how-to-face-a-critical-rag-driven-generative-ai-challenge

Mr. Denis Rothman

06 Nov 2024

15 min read

How to Face a Critical RAG-driven Generative AI Challenge

Mr. Denis Rothman

06 Nov 2024

15 min read

This article is an excerpt from the book, "RAG-Driven Generative AI", by Denis Rothman. Explore the transformative potential of RAG-driven LLMs, computer vision, and generative AI with this comprehensive guide, from basics to building a complex RAG pipeline.IntroductionOn a bright Monday morning, Dakota sits down to get to work and is called by the CEO of their software company, who looks quite worried. An important fire department needs a conversational AI agent to train hundreds of rookie firefighters nationwide on drone technology. The CEO looks dismayed because the data provided is spread over many websites around the country. Worse, the management of the fire department is coming over at 2 PM to see a demonstration to decide whether to work with Dakata’s company or not. Dakota is smiling. The CEO is puzzled. Dakota explains that the AI team can put a prototype together in a few hours and be more than ready by 2 PM and get to work. The strategy is to divide the AI team into three sub-teams that will work in parallel on three pipelines based on the reference Deep Lake, LlamaIndex and OpenAI RAG program* they had tested and approved a few weeks back. Pipeline 1: Collecting and preparing the documents provided by the fire department for this Proof of Concept(POC). Pipeline 2: Creating and populating a Deep Lake vector store with the first batch of documents while the Pipeline 1 team continues to retrieve and prepare the documents. Pipeline 3: Indexed-based RAG with LlamaIndex’s integrated OpenAI LLM performed on the first batch of vectorized documents. The team gets to work at around 9:30 AM after devising their strategy. The Pipeline 1 team begins by fetching and cleaning a batch of documents. They run Python functions to remove punctuation except for periods and noisy references within the content. Leveraging the automated functions they already have through the educational program, the result is satisfactory. By 10 AM, the Pipeline 2 team sees the first batch of documents appear on their file server. They run the code they got from the RAG program* to create a Deep Lake vector store and seamlessly populate it with an OpenAI embedding model, as shown in the following excerpt: from llama_index.core import StorageContext vector_store_path = "hub://denis76/drone_v2" dataset_path = "hub://denis76/drone_v2" # overwrite=True will overwrite dataset, False will append it vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True) Note that the path of the dataset points to the online Deep Lake vector store. The fact that the vector store is serverless is a huge advantage because there is no need to manage its size, storage process and just begin to populate it in a few seconds! Also, to process the first batch of documents, overwrite=True, will force the system to write the initial data. Then, starting the second batch, the Pipeline 2 team can run overwrite=False, to append the following documents. Finally, LlamaIndex automatically creates a vector store index: storage_context = StorageContext.from_defaults(vector_store=vector_store) # Create an index over the documents index = VectorStoreIndex.from_documents(documents, storage_context=storage_context) By 10:30 AM, the Pipeline 3 team can visualize the vectorized(embedded) dataset in their Deep Lake vector store. They create a LlamaIndex query engine on the dataset: from llama_index.core import VectorStoreIndex vector_store_index = VectorStoreIndex.from_documents(documents) … vector_query_engine = vector_store_index.as_query_engine(similarity_top_k=k, temperature=temp, num_output=mt) Note that the OpenAI Large Language Model is seamlessly integrated into LlamaIndex with the following parameters: k, in this case, k=3, specifies the number of documents to retrieve from the vector store. The retrieval is based on the similarity of embedded user inputs and embedded vectors within the dataset. temp, in this case temp=0.1, determines the randomness of the output. A low value such as 0.1 forces the similarity search to be precise. A higher value would allow for more diverse responses, which we do not want for this technological conversational agent. mt, in this case, mt=1024, determines the maximum number of tokens in the output. A cosine similarity function was added to make sure that the outputs matched the sample user inputs: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') def calculate_cosine_similarity_with_embeddings(text1, text2):     embeddings1 = model.encode(text1)     embeddings2 = model.encode(text2)     similarity = cosine_similarity([embeddings1], [embeddings2])     return similarity[0][0] By 11:00 AM, all three pipeline teams are warmed up and ready to go full throttle! While the Pipeline 2 team was creating the vector store and populating it with the first batch of documents, the Pipeline 1 team prepared the next several batches. At 11:00 AM, Dakota gave the green light to run all three pipelines simultaneously. Within a few minutes, the whole RAG-driven generative AI system was humming like a beehive! By 1:00 PM, Dakota and the three pipeline teams were working on a PowerPoint slideshow with a copilot. Within a few minutes, it was automatically generated based on their scenario. At 1:30 PM, they had time to grab a quick lunch. At 2:00 pm, the fire department management, Dakota’s team, and the CEO of their software company were in the meeting room. Dakota’s team ran the PowerPoint slide show and began the demonstration with a simple input: user_input="Explain how drones employ real-time image processing and machine learning algorithms to accurately detect events in various environmental conditions." The response displayed was satisfactory: Drones utilize real-time image processing and machine learning algorithms to accurately detect events in various environmental conditions by analyzing data captured by their sensors and cameras. This technology allows drones to process visual information quickly and efficiently, enabling them to identify specific objects, patterns, or changes in the environment in real-time. By employing these advanced algorithms, drones can effectively monitor and respond to different situations, such as wildfires, wildlife surveys, disaster relief efforts, and agricultural monitoring with precision and accuracy. Dakota’s team then showed that the program could track and display the original documents the response was based on. At one point, the fire department’s top manager, Taylor, exclaimed, “Wow, this is impressive! It’s exactly what we were looking for! " Of course, Dakato’s CEO began discussing the number of users, cost, and timelines with Taylor. In the meantime, Dakota and the rest of the fire department’s team went out to drink some coffee and get to know each other. Fire departments intervene at short notice efficiently for emergencies. So can expert-level AI teams! https://github.com/Denis2054/RAG-Driven-Generative-AI/blob/main/Chapter03/Deep_Lake_LlamaIndex_OpenAI_RAG.ipynb ConclusionIn facing a high-stakes, time-sensitive challenge, Dakota and their AI team demonstrated the power and efficiency of RAG-driven generative AI. By leveraging a structured, multi-pipeline approach with tools like Deep Lake, LlamaIndex, and OpenAI’s advanced models, the team was able to integrate scattered data sources quickly and effectively, delivering a sophisticated, real-time conversational AI prototype tailored for firefighter training on drone technology. Their success showcases how expert planning, resourceful use of AI tools, and teamwork can transform a complex project into a streamlined solution that meets client needs. This case underscores the potential of generative AI to create responsive, practical solutions for critical industries, setting a new standard for rapid, high-quality AI deployment in real-world applications.Author Bio Denis Rothman graduated from Sorbonne University and Paris-Diderot University, and as a student, he wrote and registered a patent for one of the earliest word2vector embeddings and word piece tokenization solutions. He started a company focused on deploying AI and went on to author one of the first AI cognitive NLP chatbots, applied as a language teaching tool for Mo�t et Chandon (part of LVMH) and more. Denis rapidly became an expert in explainable AI, incorporating interpretable, acceptance-based explanation data and interfaces into solutions implemented for major corporate projects in the aerospace, apparel, and supply chain sectors. His core belief is that you only really know something once you have taught somebody how to do it.

0
0
26932

article-image-empowering-modern-graphics-programming-using-vulkan

Preetish Kakkar

04 Nov 2024

10 min read

Empowering Modern Graphics Programming using Vulkan

Preetish Kakkar

04 Nov 2024

10 min read

Introduction In the rapidly evolving world of computer graphics, Vulkan has emerged as a powerful and efficient API, revolutionizing how developers approach rendering and compute operations. As the author of "The Modern Vulkan Cookbook," I've had the privilege of diving deep into this technology, exploring its intricacies, and uncovering its potential to solve real-world problems in graphics programming. This book will help you leverage modern graphics programming techniques. You’ll cover a cohesive set of examples that use the same underlying API, discovering Vulkan concepts and their usage in real-world applications.Vulkan, introduced by the Khronos Group in 2016, was designed to address the limitations of older graphics APIs like OpenGL. Its low-overhead, cross-platform nature has made it increasingly popular among developers seeking to maximize performance and gain fine-grained control over GPU resources. One of Vulkan's key strengths lies in its ability to efficiently utilize modern multi-core CPUs and GPUs. By providing explicit control over synchronization and memory management, Vulkan allows developers to optimize their applications for specific hardware configurations, resulting in significant performance improvements. Vulkan Practical Applications Vulkan's impact on solving real-world problems in graphics programming is profound and far-reaching. In the realm of mobile gaming, Vulkan's efficient use of system resources has enabled developers to create console-quality graphics on smartphones, significantly enhancing the mobile gaming experience while conserving battery life. In scientific visualization, Vulkan's compute capabilities have accelerated complex simulations, allowing researchers to process and visualize large datasets in real-time, leading to breakthroughs in fields like climate modeling and molecular dynamics. The film industry has leveraged Vulkan's ray tracing capabilities to streamline pre-visualization processes, reducing production times and costs. In automotive design, Vulkan-powered rendering systems have enabled real-time, photorealistic visualizations of car interiors and exteriors, revolutionizing the design review process. Virtual reality applications built on Vulkan benefit from its low-latency characteristics, reducing motion sickness and improving overall user experience in training simulations for industries like healthcare and aerospace. These practical applications demonstrate Vulkan's versatility in solving diverse challenges across multiple sectors, from entertainment to scientific research and industrial design. Throughout my journey writing "The Modern Vulkan Cookbook," I encountered numerous scenarios where Vulkan's capabilities shine in solving practical challenges: GPU-Driven Rendering: Vulkan's support for compute shaders and indirect drawing commands enables developers to offload more work to the GPU, reducing CPU overhead and improving overall rendering efficiency. This is particularly beneficial for complex scenes with dynamic object counts or procedurally generated geometry. Advanced Lighting and Shading: Vulkan's flexibility in shader programming allows for the implementation of sophisticated lighting models and material systems. Techniques like physically based rendering (PBR) and global illumination become more accessible and performant under Vulkan. Order-Independent Transparency: Achieving correct transparency in real-time rendering has always been challenging. Vulkan's support for advanced rendering techniques, such as A-buffer implementations or depth peeling, provides developers with powerful tools to tackle this issue effectively. Ray Tracing: With the introduction of ray tracing extensions, Vulkan has opened new possibilities for photorealistic rendering in real-time applications. This has profound implications for industries beyond gaming, including architecture visualization and film production. Challenges and Learning Curves Despite its power, Vulkan comes with a steep learning curve. Its verbose nature and explicit control can be daunting for newcomers. During the writing process, I faced the challenge of breaking down complex concepts into digestible chunks without sacrificing depth. This led me to develop a structured approach, starting with core concepts and gradually building up to advanced techniques. One hurdle was explaining the intricacies of Vulkan's synchronization model. Unlike older APIs, Vulkan requires explicit synchronization, which can be a source of confusion and errors for many developers. To address this, I dedicated significant attention to explaining synchronization primitives and their proper usage, providing clear examples and best practices. The Future of Graphics Programming with Vulkan As we look to the future, Vulkan's role in graphics programming is set to grow even further. The API continues to evolve, with new extensions and features being added regularly. Some exciting areas of development include: Machine Learning Integration: The intersection of graphics and machine learning is becoming increasingly important. Vulkan's compute capabilities make it well-suited for implementing ML algorithms directly on the GPU, opening up possibilities for AI-enhanced rendering techniques. Extended Reality (XR): With the rising popularity of virtual and augmented reality, Vulkan's efficiency and low-latency characteristics make it an excellent choice for XR applications. The integration with OpenXR further solidifies its position in this space. Cross-Platform Development: As Vulkan matures, its cross-platform capabilities are becoming more robust. This is particularly valuable for developers targeting multiple platforms, from high-end PCs to mobile devices and consoles. Conclusion Writing "The Modern Vulkan Cookbook" has been an enlightening journey, deepening my appreciation for the power and flexibility of Vulkan. As graphics hardware continues to advance, APIs like Vulkan will play an increasingly crucial role in harnessing this power efficiently. For developers looking to push the boundaries of what's possible in real-time rendering, Vulkan offers a robust toolset. While the learning curve may be steep, the rewards in terms of performance, control, and cross-platform compatibility make it a worthy investment for any serious graphics programmer. Author Bio Preetish Kakkar is a highly experienced graphics engineer specializing in C++, OpenGL, WebGL, and Vulkan. He co-authored "The Modern Vulkan Cookbook" and has extensive experience developing rendering engines, including rasterization and ray-traced pipelines. Preetish has worked with various engines like Unity, Unreal, and Godot, and libraries such as bgfx. He has a deep understanding of the 3D graphics pipeline, virtual/augmented reality, physically based rendering, and ray tracing.

0
0
50139

article-image-understanding-memory-allocation-and-deallocation-in-the-net-common-language-runtime-clr

Trevoir Williams

29 Oct 2024

10 min read

Understanding Memory Allocation and Deallocation in the .NET Common Language Runtime (CLR)

Trevoir Williams

29 Oct 2024

10 min read

IntroductionThis article provides an in-depth exploration of memory allocation and deallocation in the .NET Common Language Runtime (CLR), covering essential concepts and mechanisms that every .NET developer should understand for optimal application performance. Starting with the fundamentals of stack and heap memory allocation, we delve into how the CLR manages different types of data and the roles these areas play in memory efficiency. We also examine the CLR’s generational garbage collection model, which is designed to handle short-lived and long-lived objects efficiently, minimizing resource waste and reducing memory fragmentation. To help developers apply these concepts practically, the article includes best practices for memory management, such as optimizing object creation, managing unmanaged resources with IDisposable, and leveraging profiling tools. This knowledge equips developers to write .NET applications that are not only memory-efficient but also maintainable and scalable.Understanding Memory Allocation and Deallocation in the .NET Common Language Runtime (CLR) Memory management is a cornerstone of software development, and in the .NET ecosystem, the Common Language Runtime (CLR) plays a pivotal role in how memory is allocated and deallocated. The CLR abstracts much of the complexity involved in memory management, enabling developers to focus more on building applications than managing resources. Understanding how memory allocation and deallocation work under the hood can help you write more efficient and performant .NET applications. Memory Allocation in the CLR When you create objects in a .NET application, the CLR allocates memory. This process involves several key components, including the stack, heap, and garbage collector. In .NET, memory is allocated in two main areas: the stack and the heap. Stack Allocation: The stack is a Last-In-First-Out (LIFO) data structure for storing value types and method calls. Variables stored on the stack are automatically managed, meaning that when a method exits, all its local variables are popped off the stack, and the memory is reclaimed. This process is very efficient because the stack operates linearly and predictably. Heap Allocation: On the other hand, the heap is used for reference types (such as objects and arrays). Memory on the heap is allocated dynamically, meaning that the size and lifespan of objects are not known until runtime. When you create a new object, memory is allocated on the heap, and a reference to that memory is returned to the stack where the reference type variable is stored. When a .NET application starts, the CLR reserves a contiguous block of memory called the managed heap. This is where all reference-type objects are stored. The managed heap is divided into three generations (0, 1, and 2), which are part of the Garbage Collector (GC) strategy to optimize memory management: Generation 0: Short-lived objects are initially allocated here. This is typically where small and temporary objects reside. Generation 1: Acts as a buffer between short-lived and long-lived objects. Objects that survive a garbage collection in Generation 0 are promoted to Generation 1. Generation 2: Long-lived objects like static data reside here. Objects that survive multiple garbage collections are eventually moved to this generation. When a new object is created, the CLR checks the available space in Generation 0 and allocates memory for the object. If Generation 0 is full, the GC is triggered to reclaim memory by removing objects that are no longer in use. Memory Deallocation and Garbage Collection The CLR’s garbage collector is responsible for reclaiming memory by removing inaccessible objects in the application. Unlike manual memory management, where developers must explicitly free memory, the CLR automatically manages this through garbage collection, which simplifies memory management but requires an understanding of how and when this process occurs. Garbage collection in the CLR involves three main steps: Marking: The GC identifies all objects still in use by following references from the root objects (such as global and static references, local variables, and CPU registers). Any objects not reachable from these roots are considered garbage. Relocating: The GC then updates the references to the surviving objects to ensure that they point to the correct locations after compacting memory. Compacting: The memory occupied by the unreachable (garbage) objects is reclaimed, and the remaining objects are moved closer together in memory. This compaction step reduces fragmentation and makes future memory allocations more efficient. The CLR uses the generational approach to garbage collection in .NET, designed to optimize performance by reducing the amount of memory that needs to be examined and reclaimed. Generation 0 collections occur frequently but are fast because most objects in this generation are short-lived and can be quickly reclaimed. Generation 1 collections are less frequent but handle objects that have survived at least one garbage collection. Generation 2 collections are the most comprehensive and involve long-lived objects that have survived multiple collections. These collections are slower and more resource-intensive. Best Practices for Managing Memory in .NET Understanding how the CLR handles memory allocation and deallocation can guide you in writing more efficient code. Here are a few best practices: Minimize the Creation of Large Objects: Large objects (greater than 85,000 bytes) are allocated in a special section of the heap called the Large Object Heap (LOH), which is not compacted due to the overhead associated with moving large blocks of memory. Large objects should be used judiciously because they are expensive to allocate and manage. Use `IDisposable` and `using` Statements: Implementing the `IDisposable` interface and using `using` statements ensures that unmanaged resources are released promptly. Profile Your Applications: Regularly use profiling tools to monitor memory usage and identify potential memory leaks or inefficiencies. Conclusion Mastering memory management in .NET is essential for building high-performance, reliable applications. By understanding the intricacies of the CLR, garbage collection, and best practices in memory management, you can optimize your applications to run more efficiently and avoid common pitfalls like memory leaks and fragmentation. Effective .NET Memory Management, written by Trevoir Williams, is your essential guide to mastering the complexities of memory management in .NET programming. This comprehensive resource equates developers with the tools and techniques to build memory-efficient, high-performance applications. The book delves into fundamental concepts like: Memory Allocation and Garbage Collection Memory profiling and Optimization Strategies Low-level programming with Unsafe Code Through practical examples and best practices, you’ll learn how to prevent memory leaks, optimize resource usage, and enhance application scalability. Whether you’re developing desktop, web, or cloud-based applications, this book provides the insights you need to manage memory effectively and ensure your .NET applications run smoothly and efficiently. Author BioTrevoir Williams, a passionate software and system engineer from Jamaica, shares his extensive knowledge with students worldwide. Holding a Master’s degree in Computer Science with a focus on Software Development and multiple Microsoft Azure Certifications, his educational background is robust. His diverse experience includes software consulting, engineering, database development, cloud systems, server administration, and lecturing, reflecting his commitment to technological excellence and education. He is also a talented musician, showcasing his versatility. He has penned works like Microservices Design Patterns in .NET and Azure Integration Guide for Business. His practical approach to teaching helps students grasp both theory and real-world applications.

0
0
59693

article-image-mastering-machine-learning-best-practices-and-the-future-of-generative-ai-for-software-engineers

Miroslaw Staron

25 Oct 2024

10 min read

Mastering Machine Learning: Best Practices and the Future of Generative AI for Software Engineers

Miroslaw Staron

25 Oct 2024

10 min read

0
0
56540

article-image-solving-scalability-challenges-in-modern-system-design-from-web-apps-to-genai

Tejas Chopra, Dhirendra Sinha

23 Oct 2024

10 min read

Solving Scalability Challenges in Modern System Design: From Web Apps to GenAI

Tejas Chopra, Dhirendra Sinha

23 Oct 2024

10 min read

IntroductionIn today’s digital landscape, scalability isn’t just a buzzword—it’s a crucial determinant of success. As the complexity and user base of applications grow, so do the challenges in designing systems that can efficiently handle massive loads. This ongoing challenge of scalability was a key inspiration for my recent book, “System Design Guide for Software Professionals: Build scalable solutions – from fundamental concepts to cracking top tech company interviews” The Scalability Crisis Consider a scenario where a startup’s web application goes viral, resulting in a massive influx of users. This should be a cause for celebration, but instead, it becomes a nightmare as the application starts to slow down significantly. According to a 2024 report by Ably, nearly 85% of companies that experience sudden user growth face significant performance issues due to scalability challenges. The root cause often lies in early design decisions, where the rush to market overshadows the need to build for scale. The building Blocks Approach Over the years, I've found that the "building blocks" approach to system design is crucial for building scalable systems. This method leverages established patterns and components to improve scalability. Here are some of the key building blocks discussed in my book: Distributed Caching: A report from Ahex shows that implementing distributed caching systems like Redis or Memcached can reduce database load by up to 60%, significantly speeding up read operations. Load Balancing: Modern load balancers are more than just traffic directors; they are intelligent systems that optimize resource utilization. A 2024 NGINX report revealed that effective load balancing can improve server efficiency by 40%, enhancing performance during peak loads. Database Sharding: As data grows, a single database becomes a bottleneck. Sharding allows horizontal scaling, and companies that implemented it have seen up to a 5x increase in database throughput, as noted in a Google Cloud study. Message Queues: Asynchronous processing with message queues like Kafka or RabbitMQ can decouple system components and manage traffic spikes. A Gartner report found that this can lead to a 30% reduction in latency during peak usage times. Content Delivery Networks (CDNs): For global applications, CDNs are essential. According to Cloudflare, CDNs can reduce load times by 50-70% for users across different regions, significantly improving user experience. Real-World Application: Scaling a Hypothetical E-commerce Platform Consider an e-commerce platform initially designed as a monolithic application with a single database. This setup worked well for the first 100,000 users, but performance issues began to surface as the user base grew to a million. Approach: Microservices Architecture: Decomposing the monolith into microservices allows independent scaling of each component. Amazon famously adopted this approach, enabling it to handle billions of requests daily. Distributed Caching: Implementing a distributed cache reduced database queries by 70%, as seen in an Akamai case study. Database Sharding: Sharding the database improved query performance by 80%, according to data from MongoDB. Message Queues: Using message queues for resource-intensive tasks led to a 25% reduction in system load, as per RabbitMQ's benchmarks. CDN Deployment: Deploying a global CDN reduced page load times from 3.5 seconds to under 1 second, similar to the optimizations reported by Shopify. Example Metrics: Before optimization: The average page load time was 3.5 seconds, with 30% of requests exceeding 5 seconds during peak hours. After optimization: Reduced to 800ms, with 99% of requests completing under 2 seconds, even during Black Friday. Database query volume: Reduced by 65% through effective caching strategies. Infrastructure costs: Reduced by 40% while handling 5x more daily active users. The AI/ML Twist: Scaling GenAI Infrastructure Scaling infrastructure for Generative AI (GenAI) presents unique challenges. For instance, consider a startup offering a GenAI service for content creation. Initially, 10 high-end GPUs served 1,000 daily users, processing about 1 million tokens daily. However, rapid growth led to the processing of 500 million tokens per day for 100,000 users. Challenges: GPU Scaling: GPU scaling requires managing expensive, specialized hardware. A BCG report notes that effective GPU utilization can save companies up to 50% in infrastructure costs. Token Economy: The varying token loads in GenAI apps pose significant challenges. Stanford University says token loads can vary dramatically, complicating resource prediction. Cost Management: Cloud GPU instances can cost over $10,000/month. AWS reports that optimized GPU management strategies can reduce costs by 30%. Latency Expectations: Users expect near-instant responses. A study by OpenAI found that sub-second latencies are critical for real-time applications. Solutions: Dynamic GPU Allocation: Implementing dynamic GPU allocation can reduce idle times and costs, as observed by Google Cloud. Request Batching: Grouping user requests can improve GPU throughput by 20%, according to Azure AI. Model Optimization: Techniques like quantization and pruning can reduce model size by 70% and increase inference speed by 50%, as highlighted in MIT’s research. Tiered Service Levels: Offering different response time guarantees can optimize resource allocation, as shown by Microsoft Azure. Distributed Inference: Splitting models across GPUs or using CPU inference can reduce GPU load by 40%, based on Google AI's findings. Example Metrics: Cost per 1000 tokens: Reduced from $0.05 to $0.015 through optimized GPU management. p99 Latency: Improved from 5 seconds to 1.2 seconds. Infrastructure scaling: Handled 1 billion daily tokens with only a 20x increase in costs, compared to the 100x increase projected by traditional scaling methods. Beyond Technology: The Human Factor While technology is critical, fostering a culture of scalability is equally important. A Harvard Business Review article emphasized that companies prioritizing scalable culture from the start are 50% more likely to sustain growth without operational roadblocks. Strategies: Encourage developers to consider scalability from the outset. Invest in monitoring and observability tools to detect issues early. Regularly conduct load tests and capacity planning. Adopt a DevOps culture to break down silos between development and operations. The Road Ahead As we move forward, innovations in edge computing, serverless architectures, and large-scale machine learning will continue to push the boundaries of scalability. However, the foundational principles of scalable system design—modularity, redundancy, and efficient resource utilization—remain vital. By mastering these principles, you can build systems that grow and adapt to an ever-changing digital landscape, whether you’re scaling a web application or pioneering generative AI technologies. Remember, scalability is not a destination but a journey, and having the right building blocks makes all the difference.

0
1
13613

The Small-File Tax: How Compaction, Clustering, and Pruning Change Lakehouse Cost

When data is not enough: Why is meaning important in enterprise AI?

From C++98 to C++23: The Arithmetic Mean, Benchmarked and Optimized

Revolutionising Work and Everyday Life with ChatGPT

Enhancing Data Quality with Cleanlab

Unlocking Insights: How Power BI Empowers Analytics for All Users

Mastering Midjourney AI World for Design Success

Artificial Intelligence in Game Development: Understanding Behavior Trees

The Complete Guide to NLP: Foundations, Techniques, and Large Language Models

Simplifying AI pipelines using the FTI Architecture

Trending Topics

How to Face a Critical RAG-driven Generative AI Challenge

Empowering Modern Graphics Programming using Vulkan

Understanding Memory Allocation and Deallocation in the .NET Common Language Runtime (CLR)

Mastering Machine Learning: Best Practices and the Future of Generative AI for Software Engineers

Solving Scalability Challenges in Modern System Design: From Web Apps to GenAI

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access