Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-general-availability-onnx-1-0

08 Dec 2017

2 min read

Amazon, Facebook and Microsoft announce the general availability of ONNX v0.1

08 Dec 2017

Amazon, Facebook, and Microsoft have recently rolled out an exciting announcement for developers. The news is... ONNX 1.0 format is now production ready! Open Neural Network Exchange (ONNX) format allows interoperability feature between various deep learning frameworks such as Caffe2, Apache MXNet, Microsoft Cognitive Toolkit (CNTK), and PyTorch. With the new interoperable feature, the version 1.0 allows users to get their deep learning models into production at a much faster pace. One can also train the model on one framework (PyTorch, for instance), and carry-out inference on another framework (Microsoft CNTK or Apache MXNet). Since the initial release of ONNX in the month of September, many communities are getting involved and adopting ONNX within their organizations--Amazon, Facebook, and Microsoft being the major ones. Many hardware-based organizations such as Qualcomm, Huawei, and Intel have announced an ONNX support for their hardware platforms. This gives users the freedom to run their models on different hardware platforms. Also, making frequent use of different frameworks results into integrating optimizations separately within each framework. Here, ONNX makes it easy for optimization to reach more developers . Tools for ONNX 1.0 Netron Netron is a viewer for ONNX neural network models. It is capable of running on macOS, Windows, Linux and serves models via a Python web server. For a more detailed overview on Netron, visit the GitHub link here. Net Drawer The Net drawer tool is used to visualize the ONNX models. This tool takes a serialized ONNX model as input and processes a directed graph representation. The output graph contains information on input/output tensors, tensor names, operator types and numbers, and so on. To know more about the working of Net drawer tool visit the GitHub link here. At present, ONNX models are supported in frameworks such as MXNet, Microsoft Cognitive Toolkit, PyTorch, and Caffe2. However, there are connectors for other common frameworks and libraries as well. Also, the current version of ONNX is designed keeping computer vision applications in mind. Amazon, Facebook, and Microsoft communities along with the ONNX community and its partners are working in union to expand beyond vision applications in the future versions of ONNX. To know more about ONNX 1.0 in detail, please visit GitHub , or the ONNX Website.

0
0
5459

article-image-tensorflow-1-9-0-rc2-releases

Natasha Mathur

05 Jul 2018

3 min read

TensorFlow 1.9.0-rc2 releases!

Natasha Mathur

05 Jul 2018

3 min read

After the 1.9.0-rc0 release early last month, the TensorFlow team is out with another update 1.9.0-rc2, unveiling major features and updates. The new release includes latest improvements, bug fixes, and other changes. Let’s have a look at the noteworthy features in TensorFlow 1.9.0-rc2: Key features and improvements Docs for tf.keras namely the new Keras-based ‘get started’ page and the programmers guide page have been updated. Layers tf.keras.layers.CuDNNGRU and tf.keras.layers.CuDNNLSTM have been added. Python interface for TFLite Optimizing Converter is expanded. Command line interface (AKA: toco, tflite_convert) is added in the standard pip installation again. Data loading and text processing has improved with: Tf.decode_compressed, tf.string_strip, and tf.strings.regex_full_match Headers used for custom apps have moved from site-packages/external into site-packages/tensorflow/include/external. On opening empty variable scopes; replace variable_scope('', ...) by variable_scope(tf.get_variable_scope(), ...). Bug Fixes tfe.Network has now been deprecated. You can now inherit from tf.keras.Model. The layered variable names have changed for the conditions mentioned below: using tf.keras.layers with custom variable scopes using tf.layers in a subclass of tf.keras.Model class. In tf.data, DatasetBase::DebugString() method is now const. The tf.contrib.data.sample_from_datasets() API for randomly sampling from multiple datasets has now been added. In tf.contrib, tf.contrib.data.choose_from_datasets() is added and tf.contrib.data.make_csv_dataset() will now support line breaks in quoted strings. From make_csv_dataset, two arguments were removed.The tf.contrib.framework.zero_initializer supports ResourceVariable. "Constrained_optimization" is added to tensorflow/contrib. Other changes GCS Configuration Ops has been added. The signature of Makelterator has been changed to enable propagation of error status. The bug in tf.reduce_prod gradient has been fixed for complex dtypes. Benchmark for tf.scan has been updated in order to match ranges across eager and graph modes. Optional args argument added to Dataset.from_generator(). Ids in nn.embedding_lookup_sparse have been made unique which helps reduce RPC calls that are made for looking up the embeddings in case there are repeated ids in the batch. tf.train.Checkpoint is added for reading/writing object-based checkpoints. To get more information on the new updates and features in the latest TensorFlow 1.9.0-rc2 release, check out the official release notes. Use TensorFlow and NLP to detect duplicate Quora questions [Tutorial] TensorFlow.js 0.11.1 releases! Build and train an RNN chatbot using TensorFlow [Tutorial]

0
0
5428

article-image-splunk-introduces-machine-learning-capabilities-in-splunk-enterprise-and-splunk-cloud

Amey Varangaonkar

03 Oct 2018

2 min read

Splunk introduces machine learning capabilities in Splunk Enterprise and Splunk Cloud

Amey Varangaonkar

03 Oct 2018

2 min read

Splunk have kicked off their .conf 2018 conference with a bang. Splunk announced plans to update their premium enterprise products - Splunk Enterprise and Splunk Cloud - powered with machine learning and analytics capabilities. These products will be initially made available for public use through beta programs, and will allow Splunk customers to work with large-scale data and extract useful insights out of them. The upcoming versions of Splunk Enterprise and Splunk Cloud will include the following features: Splunk Business Flow - This feature will give customers the ability to make smarter business decisions by understanding relevant trends and process flows. Splunk Data Stream Processor - This feature will let users evaluate, transform and perform predictive analytics on streaming data. Splunk Mobile & Splunk Cloud Gateway - This new feature will allow Splunk users to interact with a whole suite of Splunk products from their mobile devices. Leverage the power of Natural Language Processing and query Splunk through voice and text commands. Build effective cloud-native services with Splunk Developer Cloud. Splunk Data Fabric Search will allow Splunk users to perform large-scale search of real-time data. Use Augmented Reality (AR) to interact with data such as QR codes, UPC scanning, etc. In his keynote, Merritt also announced plans for Splunk MLTK, a Splunk-specific application to run AI use-cases. This tool will be able to interact with key open source AI libraries such as Tensorflow and Spark’s MLlib. With these capabilities, businesses will be able to perform large-scale analytics by harnessing AI in their Splunk applications and related projects. Splunk Chief Executive Doug Merritt was of the opinion that businesses are finding it hard to keep track of the ERP data inside their data centers. “If we really want to be successful, we’ve got to tap into this sea of data around the world, outside of our walls,” he said. “We are in the midst of the data revolution, and these product updates ensure the Splunk platform evolves as our world does to deliver business outcomes no matter the organization, team or dataset”. Read more Why should enterprises use Splunk? Splunk leverages AI in its monitoring tools Splunk Industrial Asset Intelligence (Splunk IAI) targets Industrial IoT marketplace

0
0
5424

Matthew Emerick

14 Oct 2020

2 min read

Why you should NEVER run a Logistic Regression (unless you have to) from Featured Blog Posts - Data Science Central

Matthew Emerick

14 Oct 2020

2 min read

Hello fellow Data Science-Centralists! I wrote a post on my LinkedIn about why you should NEVER run a Logistic Regression. (Unless you really have to). The main thrust is: There is no theoretical reason why a least squares estimator can't work on a 0/1. There are very very narrow theoretical reasons that you want to run a logistic, and unless you fall into those categories it's not worth the time. The run time of a logistic can be up to 100x longer than an OLS model. If you are doing v-fold cross-validation save yourself some time. The XB's are exactly the same whether you use a Logistic or a linear regression. The model specification (features, feature engineering, feature selection, interaction terms) are identical -- and this is what you should be focused on anyways. Myth: Linear regression can only run linear models. There is *one* practical reason to run a logistic: if the results are all very close to 0 or to 1, and you can't hard code your prediction to 0 or 1 if the linear models falls outside a normal probability range, then use the logistic. So if you are pricing an insurance policy based on risk, you can't have a hard-coded 0.000% prediction because you can't price that correctly. See video here and slides here. I think it'd be nice to start a debate on this topic!

0
0
5375

article-image-using-cloudera-machine-learning-to-build-a-predictive-maintenance-model-for-jet-engines-from-cloudera-blog

Matthew Emerick

14 Oct 2020

6 min read

Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines from Cloudera Blog

Matthew Emerick

14 Oct 2020

6 min read

Introduction Running a large commercial airline requires the complex management of critical components, including fuel futures contracts, aircraft maintenance and customer expectations. Airlines, in just the U.S. alone, average about 45,000 daily flights, transporting over 10 million passengers a year (source: FAA). Airlines typically operate on very thin margins, and any schedule delay immediately angers or frustrates customers. Flying is not inherently dangerous, but the consequence of a failure is catastrophic. Airlines have such a sophisticated business model that encompasses a culture of streamlined supply chains, predictive maintenance, and unwavering customer satisfaction. To maximize safety for all passengers and crew members, while also delivering profits, airlines have heavily invested in predictive analytics to gain insight on the most cost-effective way to maintain real-time engine performance. Additionally, airlines ensure availability and reliability of their fleet by leveraging maintenance, overhaul and repair (MRO) organizations, such as Lufthansa Technik. Lufthansa Technik worked with Cloudera to build a predictive maintenance platform to service its fleet of 5000 aircraft throughout its global network of 800 MRO facilities. Lufthansa Technik extended a standard practice of placing sensors on aircraft engines and enabling predictive maintenance to automate fulfilment solutions. By combining profound airline operation expertise, data science, and engine analytics to a predictive maintenance schedule, Lufthansa Technik can now ensure critical parts are on the ground (OTG) when needed, instead of the entire aircraft being OTG and not producing revenue. The objective of this blog is to show how to use Cloudera Machine Learning (CML), running Cloudera Data Platform (CDP), to build a predictive maintenance model based on advanced machine learning concepts. The Process Many companies build machine learning models using libraries, whether they are building perception layers for autonomous vehicles, allowing autonomous vehicle operation, or modeling a complex jet engine. Kaggle, a site that provides test training data sets for building machine learning models, provides simulation data sets from NASA that measures engine component degradation for turbofan jet engines. The models in this blog are built on CML and are based on inputting various engine parameters showing typical sensor values of engine temperature, fuel consumption, vibration, or fuel to oxygen mixture (see Fig 1). One item to note in this blog is that the term “failure” is not to imply catastrophic failure, but rather, that one of its components (pumps, values, etc) is not operating to specification. Airlines design their aircraft to operate at 99.999% reliability. Fig 1: Turbofan jet engine Step 1: Using the training data to create a model/classifier First, four test and training data sets for varying conditions and failure modes were organized in preparation for CML (see box 1 in Fig 2). Each set of training data shows the engine parameters per flight while each engine is “flown” until an engine component signals failure. This is done at both sea level and all flight conditions. This data will be used to train the model that can predict how many flights a given engine has until failure. For each training set, there is a corresponding test data set that provides data on 100 jet engines at various stages of life with actual values on which to test the predictive model for accuracy. Fig 2: Diagram showing how CML is used to build ML training models Step 2: Iterate on the model to validate and improve effectiveness CML was used to create a model that estimated the amount of remaining useful life (RUL) for a given engine using the provided test and training data sets. A threshold of one week–the time allowance to place parts on the ground–was planned for a scenario that alerts an airline before a potential engine component failure. Assuming four flights daily, this means the airline would like to know with confidence if an engine is going to fail within 40 flights. The model was tested for each engine, and the results were classified as true or false for potential failure within 40 flights (see Table 1). Table 1: Data in table based on one week of data of 40 flights. Step 3: Apply an added cost value to the results With no preventative maintenance, an engine that runs out of life or fails can compromise safety and cost millions more dollars to replace an engine. If an engine is maintained or overhauled before it runs out of life, the cost of overhaul is significantly less. However, if the engine is overhauled too early, there is potential engine life that could have still been utilized. The estimated cost in this model for each of these overhaul outcomes can be seen below (see Fig 3). Fig 3: Cost-benefit confusion matrix Conclusion Using Cloudera Machine Learning to analyze NASA jet engine simulation data provided by Kaggle, our predictive maintenance model predicted when an engine was likely to fail or when it required an overhaul with very high accuracy. Combining the cost-benefit analysis with this predictive model against the test data sets suggested significant savings across all applied scenarios. Airline decisions are always made with a consideration to safety first and then consideration to profit second. Predictive maintenance is preferred because it is always the safest choice, and it delivers drastically lower maintenance costs over reactive (engine replacement after failure) or proactive (replacing components before engine replacement) approaches. Next Steps To see all this in action, please click on links below to a few different sources showcasing the process that was created. Video – If you’d like to see and hear how this was built, see video at the link. Tutorials – If you’d like to do this at your own pace, see a detailed walkthrough with screenshots and line by line instructions of how to set this up and execute. Meetup – If you want to talk directly with experts from Cloudera, please join a virtual meetup to see a live stream presentation. There will be time for direct Q&A at the end. CDP Users Page – To learn about other CDP resources built for users, including additional video, tutorials, blogs and events, click on the link. The post Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines appeared first on Cloudera Blog.

0
0
5273

article-image-weekly-digest-october-19-from-featured-blog-posts-data-science-central

Matthew Emerick

18 Oct 2020

1 min read

Weekly Digest, October 19 from Featured Blog Posts - Data Science Central

Matthew Emerick

18 Oct 2020

1 min read

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link. Featured Resources and Technical Contributions Best Models For Multi-step Time Series Modeling Types of Variables in Data Science in One Picture A quick demonstration of polling confidence interval calculations using simulation Why you should NEVER run a Logistic Regression (unless you have to) Cross-validation and hyperparameter tuning 5 Great Data Science Courses Complete Hands-Off Automated Machine Learning Why You Should Learn Sitecore CMS? Featured Articles AI is Driving Software 2.0… with Minimal Human Intervention Data Observability: How to Fix Your Broken Data Pipelines Applications of Machine Learning in FinTech Where synthetic data brings value Why Fintech is the Future of Banking? Real Estate: How it is Impacted by Business Intelligence Determining How Cloud Computing Benefits Data Science Advantages And Disadvantages Of Mobile Banking Picture of the Week Source: article flagged with a + To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click here. Follow us: Twitter | Facebook.

0
0
5258

Matthew Emerick

23 Sep 2020

1 min read

Announcing the Upcoming Evolution of Power BI Premium to enterprise markets and beyond from Microsoft Power BI Blog | Microsoft Power BI

Matthew Emerick

23 Sep 2020

1 min read

Yesterday was filled with announcements of new capabilities of Power BI Premium and even a per user licensing option to gain access to Premium features. This blog post sums those up so you can prepare for a much better experience of owning and using Power BI Premium.

0
0
5258

article-image-ibm-rolls-deep-learning-service-dlaas-program-ai-developers

Savia Lobo

21 Mar 2018

2 min read

IBM rolls out Deep Learning as a Service (DLaaS) program for AI developers

Savia Lobo

21 Mar 2018

2 min read

On March 20, 2018, IBM launched its brand new Deep Learning as a Service (DLaaS) program for AI developers. The Deep Learning as a Service, which runs on IBM Watson, is an experiment-centric model training environment. This means users don’t have to worry about getting bogged down with planning and managing training runs themselves. Instead, the entire training life-cycle is managed automatically and the results can be viewed in real-time and can also be revisited later. The DLaaS service allows data scientists to train models using the resources they need and they simply have to pay only for the GPU time. Users can train their neural networks using a range of deep learning frameworks such as TensorFlow, PyTorch, and Caffe, without the need to buy and maintain the hardware cost. In order to use the service, users just have to prepare their data, upload it, begin training, then download the training results. This can potentially snip days or weeks off of training times. For instance, if a single GPU setup takes nearly a week to train visual image processing neural network on a couple million pictures, the time taken to do the same thing can be cut down to mere hours with this new cloud solution. Also, maintaining deep learning systems requires manpower. By investing time in IBM’s DLaaS can scale projects, which can result into clustering just a few GPUs for deep learning models. This experience is an entirely different skill set than training neural networks. To read more about this in detail, check out IBM’s blog post.

0
0
5211

Pravin Dhandre

17 May 2018

2 min read

pandas 0.23 released

Pravin Dhandre

17 May 2018

2 min read

From its previous major release v0.22, the contributors of pandas release the next major version 0.23.0 with numerous new features, enhancements, lists of API changes and deprecations. This release adds pivotal support in performing custom types operations and extended support to arguments and conversion tasks.The upgraded version also power ups with performance improvements along with a large number of bug fixes. New feature highlights v0.23: Round-trippable JSON format with ‘table’ orient. Instantiation from dicts respects order for Python 3.6+. Dependent column arguments for assign. Merging/sorting on a combination of columns and index levels. Extending Pandas with custom types. Excluding unobserved categories from groupby. Changes to make output shape of DataFrame.apply consistent. Bug Fixes: Resolved bugs related to categorical operations like merge, index constructor, factorize. Bugs in numeric operations like Series constructor, Index multiplication, DataFrame flex arithmetic fixed. Other bugs related to Strings, indexing, Timezones, TimeDelta are also fixed in this version. Python’s pandas package provides developers with fast, flexible, and expressive data structures making it easy and intuitive to work with “relational” and “labeled” data. With its continuation release of feature-packed versions, pandas could soon become the most powerful and flexible open source data analysis and manipulation tool for your data science project. To know more about the API changes, deprecations and performance improvements, please read release documentation on Github. “Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou Working with pandas DataFrames Up and Running with pandas

0
0
5182

Matthew Emerick

24 Sep 2020

1 min read

See all the Power BI updates at the Microsoft Business Applications Launch Event from Microsoft Power BI Blog | Microsoft Power BI

Matthew Emerick

24 Sep 2020

1 min read

We’re excited to share all the new innovations we’re rolling out for Power BI to help make creating professional-grade apps even easier. Join us on October 1, 2020, from 9–11 AM Pacific Time (UTC -7), for this free digital event.

0
0
5165

article-image-ai-is-driving-software-2-0-with-minimal-human-intervention-from-featured-blog-posts-data-science-central

Matthew Emerick

15 Oct 2020

6 min read

AI is Driving Software 2.0… with Minimal Human Intervention from Featured Blog Posts - Data Science Central

Matthew Emerick

15 Oct 2020

6 min read

The future of software development will be model-driven, not code-driven. Now that my 4th book (“The Economics of Data, Analytics and Digital Transformation”) is in the hands of my publisher, it’s time to get back to work investigating and sharing new learnings. In this blog I’ll take on the subject of Software 2.0. And thanks Jens for the push in this direction! Imagine trying to distinguish a dog from other animals in a photo coding in if-then statements: If the animal has four legs (except when it only has 3 legs due to an accident), and if the animal has short fur (except when it is a hair dog or a chihuahua with no fur), and if the animal has medium length ears (except when the dog is a bloodhound), and if the animal has a medium length legs (except when it’s a bull dog), and if… Well, you get the point. In fact, it is probably impossible to distinguish a dog from other animals coding in if-then statements. And that’s where the power of model-based (AI and Deep Learning) programming shows its strength; to tackle programming problems – such as facial recognition, natural language processing, real-time dictation, image recognition – that are nearly impossible to address using traditional rule-based programming (see Figure 1). Figure 1: How Deep Learning Works As discussed in “2020 Challenge: Unlearn to Change Your Frame”, most traditional analytics are rule based; the analytics make decisions guided by a pre-determined set of business or operational rules. However, AI and Deep Learning make decisions based upon the "learning" gleaned from the data. Deep Learning “learns” the characteristics of entities in order to distinguish cats from dogs, tanks from trucks, or healthy cells from cancerous cells (see Figure 2). Figure 2: Rules-based versus Learning-based Programing This learning amplifies when there is a sharing of the learnings across a collection of similar assets – vehicles, trains, airplanes, compressors, turbines, motors, elevators, cranes – so that the learnings of one asset can be aggregated and backpropagated to the cohort of assets. The Uncertain Future of Programming A recent announcement from NVIDIA has the AI community abuzz, and software developers worrying about their future. NVIDIA researchers recently used AI to recreate the classic video game Pac-Man. NVIDIA created an AI model using Generative Adversarial Networks (GANs) (called NVIDIA GameGAN) that can generate a fully functional version of Pac-Man without the coding associated with building the underlying game engine. The AI model was able to recreate the game without having to “code” the game’s fundamental rules (see Figure 3). Figure 3: “How GANs and Adaptive Content Will Change Learning, Entertainment and More” Using AI and Machine Learning (ML) to create software without the need to code the software is driving the "Software 2.0" phenomena. And it is impressive. An outstanding presentation from Kunle Olukotun titled “Designing Computer Systems for Software 2.0” discussed the potential of Software 2.0 to use machine learning to generate models from data and replace traditional software development (coding) for many applications. Software 2.0[1] Due to the stunning growth of Big Data and IOT, Neural Networks now have access to enough detailed, granular data to surpass conventional coded algorithms in the predictive accuracy of complex models in areas such as image recognition, natural language processing, autonomous vehicles, and personalized medicine. Instead of coding software algorithms in the traditional development manner, you train Neural Network – leveraging backpropagation and stochastic gradient descent – to optimize the neural network nodes’ weights to deliver the desired outputs or outcomes (see Figure 4). Figure 4: “Neural Networks: Is Meta-learning the New Black?” With model-driven software development, it is often easier to train a model than to manually code an algorithm, especially for complex applications like Natural Language Processing (NLP) and image recognition. Plus, model-driven software development is often more predictable in term of runtimes and memory usage compared to conventional algorithms For example, Google’s Jeff Dean reported that 500 lines of TensorFlow code replaced 500,000 lines of code in Google Translate. And while a thousand-fold reduction is huge, what’s more significant is how this code works: rather than half a million lines of static code, the neural network can learn and adapt as biases and prejudices in the data are discovered. Software 2.0 Challenge: Data Generation In the article “What machine learning means for software development”, Andrew Karpathy states that neural networks have proven they can perform almost any task for which there is sufficient training data. Training Neural Networks to beat Go or Chess or StarCraft is possible because of the large volume of associated training data. It’s easy to collect training data for Go or Chess as there is over 150 years of data from which to train the models. And training image recognition programs is facilitated by the 14 million labeled images available on ImageNet. However, there is not always sufficient data to neural network models in all cases. Significant effort must be invested to create and engineer training data, using techniques such as noisy labeling schemes, data augmentation, data engineering, and data reshaping, to power the model-based neural network applications. Welcome to Snorkel. Snorkel (damn cool name) is a system for programmatically building and managing training datasets without manual labeling. Snorkel can automatically develop, clean and integrate large training datasets using three different programmatic operations (see Figure 5): Labeling data through the use of heuristic rules or distant supervision techniques Transforming or augmenting the data by rotating or stretching images Slicing data into different subsets for monitoring or targeted improvement Figure 5: Programmatically Building and Managing Training Data with Snorkel Snorkel is a powerful tool for data labeling and data synthesis. Labeling data manually is very time-consuming, and Snorkel can address this issue programmatically, and the resulting data can be validated by human beings by looking at some samples of the data. See “Snorkel Intro Tutorial: Data Augmentation” for more information on its workings. Software 2.0 Summary There are certain, complex programming problems – facial recognition, natural language processing, real-time dictation, image recognition, autonomous vehicles, precision medicine – that are nearly impossible to address using traditional rule-based programming. In these cases, it is easier to create AI, Deep Learning and Machine Learning models that can trained (with large data sets) to deliver the right actions versus being coded to deliver the right actions. This is the philosophy of Software 2.0. Instead of coding software algorithms in the traditional development manner, you train a Neural Network to optimize the neural network nodes’ weights to deliver the desired outputs or outcomes. And model-driven programs have the added advantage of being able to learn and adapt… the neural network can learn and adapt as biases and prejudices in the data are discovered. However, there is not always sufficient data to neural network models in all cases. In those cases, new tools like Snorkel can help… Snorkel can automatically develop, clean and integrate large training datasets The future of software development will be model-driven, not code-driven. Article Sources: Machine Learning vs Traditional Programming Designing Computer Systems for Software 2.0 (PDF) Software Ate the World, Now AI Is Eating Software: The road to Software 2.0 [1] Kunle Olukotun’s presentation and video.

0
0
5159

article-image-what-you-need-to-know-to-begin-your-journey-to-cdp-from-cloudera-blog

Matthew Emerick

13 Oct 2020

5 min read

What you need to know to begin your journey to CDP from Cloudera Blog

Matthew Emerick

13 Oct 2020

5 min read

Recently, my colleague published a blog build on your investment by Migrating or Upgrading to CDP Data Center, which articulates great CDP Private Cloud Base features. Existing CDH and HDP customers can immediately benefit from this new functionality. This blog focuses on the process to accelerate your CDP journey to CDP Private Cloud Base for both professional services engagements and self-service upgrades. Upgrade with Confidence with Cloudera Professional Services Cloudera recommends working with Cloudera Professional Services to simplify your journey to CDP Private Cloud Base and get faster time to value. Cloudera PS offers SmartUpgrade to help you efficiently upgrade or migrate to CDP Private Cloud Base with minimal disruptions to your SLAs. Preparing for an Upgrade Whether you choose to manage your own upgrade process or leverage our Professional Services organization, Cloudera provides the tools you need to get started. Before you Begin Contact your account team to start the process. Generate a diagnostic bundle to send information about your cluster to Cloudera Support for analysis. The diagnostic bundle consists of information about the health and performance of the cluster. Learn more about how to send diagnostic bundles. 1.On a CDH cluster, use Cloudera Manager. 2. On an HDP Cluster, use SmartSense. Gather information that diagnostic tool will not be able to automatically obtain: What is the primary purpose of the cluster? HDP customers only: Which relational database and version is used? How many database objects do you have? Which external APIs you are using? Which third party software do you use with this cluster? Create an Upgrade Planning Case To manage your own upgrade process, follow these steps to file an upgrade planning case to ensure a smooth upgrade experience: Go to the Cloudera Support Hub and click Create Case. Select Upgrade Planning. In Product to Upgrade, select a product from the list. Choices are: Ambari HDP HDP & Ambari CDH Cloudera Manager CDH & Cloudera Manager Are you upgrading to CDP Private Cloud Base? Select Yes or No. What is your target version? Select the version of the product and the version of the Cloudera Manager or Ambari. Complete information about your assets and timeline. Attach the diagnostic bundle you created. Diagnostics will run through your bundle data to identify potential issues that need to be addressed prior to an upgrade. Include in the information that you gathered earlier in the “Before you Begin”. A case is created. CDP Upgrade Advisor The CDP Upgrade Advisor is a utility available on my.cloudera.com for Cloudera customers. This tool performs an evaluation of diagnostic data to determine the CDP readiness of your CDH or HDP cluster environment. Running the upgrade advisor against the cluster in question is one of your first steps to adopting CDP, followed by an in-depth conversation with your Cloudera account team to review the specific results. This utility raises awareness of clusters that may present risks during an upgrade to CDP due to, for example, an unsupported of the operating system currently in use. The upgrade advisor utility is focused on the environment and platform in use but doesn’t take into consideration use-cases, the actual cluster data, or workflows in use. Analysis of these critical areas occurs as part of your CDP Journey Workshop with your Cloudera account team and Professional Services. To run the Upgrade Advisor: Click Upgrade Path to begin the evaluation based on your diagnostic data The first thing you’ll see is a list of your active assets (CDH, DataFlow, HDP, Key Trustee, and CDP assets). The upgrade advisor is available only for CDH and HDP environments. Click the respective CDP Upgrade Advisor link on the right-hand side of a CDH or HDP asset to obtain the evaluation results The Upgrade Advisor determines a recommended upgrade path for the asset in question. You may see a recommendation to upgrade to CDP Data Center (Private Cloud Base), Public Cloud, or not to upgrade at this time due to the environmental failures identified. Beneath the recommendations are details of the cluster asset being evaluated along with contact details for your Cloudera account team. The Evaluation Details section includes the results of the validation checks being performed against your diagnostic data. This includes risks and recommendations such as a particular service or version of 3rd party software that will not be supported after an upgrade to CDP. Each category of the evaluation details also features icons that will take you to the relevant CDP documentation. You can view a video (recommended) about the Upgrade Advisor: Validate partner certifications For partner ecosystem support for CDP, you can validate your partner application certifications with this blog: Certified technical partner solutions help customers success with Cloudera Data Platform. Please also work with your account team for partner technology applications that are not currently on the certified list. Learn from Customer Success Stories Take a deeper look at one customer’s journey to CDP in this blog. A financial services customer upgraded their environment from CDH to CDP with Cloudera Professional Services in order to modernize their architecture to ingest data in real-time using the new streaming features available in CDP and make the data available to their users faster than ever before. Summary Take the next steps on your journey to CDP now by visiting my.cloudera.com to assess your clusters in the Upgrade Advisor and sign up for a trial of CDP Private Cloud Base. To learn more about CDP, please check out the CDP Resources page. The post What you need to know to begin your journey to CDP appeared first on Cloudera Blog.

0
0
5114

Matthew Emerick

14 Oct 2020

1 min read

Applications of Machine Learning in FinTech from Featured Blog Posts - Data Science Central

Matthew Emerick

14 Oct 2020

1 min read

Machine learning is a type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed. The science behind machine learning is interesting and application-oriented. Many startups have disrupted the FinTech ecosystem with machine learning as their key technology. There are various applications of machine learning used by the FinTech companies falling under different subcategories. Let us look at some of the applications of machine learning and companies using such applications. Table of contents Predictive Analysis for Credit Scores and Bad Loans Accurate Decision-Making Content/Information Extraction Fraud Detection and Identity Management To read the whole article, with each point detailed, click here.

0
0
5114

article-image-the-pace-of-innovation-never-rests-how-lessons-from-our-past-still-influence-us-today-from-whats-new

Anonymous

03 Dec 2020

7 min read

The pace of innovation never rests: How lessons from our past still influence us today from What's New

Anonymous

03 Dec 2020

7 min read

Andrew Beers Chief Technology Officer, Tableau Kristin Adderson December 3, 2020 - 3:27pm December 3, 2020 Everyone is talking about the need for innovation these days, but there are a lot of questions about the best ways to move forward. Even before the Covid-19 crisis hit, McKinsey found that 92 percent of company leaders thought their business models wouldn’t stay viable at the then-current rates of digitization, and the pandemic has only accelerated this need for rapid innovation in the digital world. As we’ve helped several customers navigate the uncertainty and find solutions, we always go back to what’s at the core of innovation at Tableau. A recent event was the perfect opportunity to pause and look at how we’ve also weathered uncertainty and increased our pace of innovation throughout the history of Tableau—and how these lessons still serve us today. The IEEE VIS conference is the premier forum for academic and applied research in visualization, bringing together an international community to share ideas and celebrate innovation every year. It also hands out the Test of Time Awards honoring work that has endured and remained relevant for at least a decade or longer after its initial publication. This year, Tableau co-founders Chris Stolte and Pat Hanrahan, with their former colleague Diane Tang, received the 20-year Test of Time Award for their groundbreaking research underlying Tableau, a paper titled Polaris: a system for query, analysis and visualization of multidimensional relational databases. The Polaris user interface with explanations from the paper. The Polaris paper laid out several key ideas: Interactive specification of the visualization using a drag-and-drop user interface; the VizQL query language that described both the visualization and the data query; and the ability to live query relevant data directly from its database, eliminating the need to load data files into memory. In 2003, Chris Stolte, Christian Chabot, and Pat Hanrahan founded Tableau based on this work, and developed Polaris from an academic prototype into the company’s first product—Tableau Desktop. Of course, academic prototypes are usually intended to demonstrate an idea not scale to market. To become viable, they had to transform their prototype into a product that could withstand daily use by many different people with various needs, data, and environments. Transforming a prototype into a product that could be shipped was not a trivial undertaking as many technical and product challenges stood between our founders and building a successful company. Dr. Chris Stolte accepting the VIS Test of Time award on behalf of his co-authors, Dr. Pat Hanrahan and Dr. Diane Tang When I joined Tableau in 2004, I was Tableau’s seventh employee jumping back into a developer role after leading engineering teams at another California-based startup. As a young company—even with an incredible new product—we had to constantly knock down technical challenges and think about how to be to be different. We focused on giving people new ways of asking and answering questions they couldn’t easily address with the existing tools they had on hand. That pushed us to figure out how to extend the original technology we had built around VizQL with even more new capabilities, including maps and geocoding, building statistical models, and supporting multiple data sources through blending and federation. This enabled us to leap ahead and show customers there were different and vastly improved ways of working with their data. These early lessons in innovation still impact and inform everything we do in engineering and development at Tableau today. Early on, we learned to listen to what our customers were trying to accomplish, but we never stopped with only delivering what they asked of us. We also became customers of our own product by running our development team and the entire company on data analyzed with the product we were building. We didn’t want to miss any opportunities for improvements or just build what our customers needed right now. We wanted to reinvent how we could all work with data, then do it again and again, taking ourselves and our customers on a journey past how we were working with data today to a place we thought would be more powerful. In addition to being our own customer and critic, we knew that as a young, small company we had to demonstrate how Tableau worked and do it fast. We did this by often demonstrating our product using data that our customers provided. This turned out to be a highly effective way to see the almost immediate impact of connecting people to the meaningful insights in their data. In fact, on one sales engagement our former CEO Christian Chabot gave a demo to about 40 people at a customer site. The demo went well, but the group was distracted. Chabot wondered what it could be and asked for feedback. He was told, rather excitedly, that the team was distracted from his demo by the insights Tableau revealed in their data. We learned early on that giving people new ways to do things opens their eyes to better ways of understanding their businesses. Today, we continue the search for new and better ways to work with data. Whether we are helping customers analyze their data using natural language with Ask Data, or helping them surface outliers and explain specific points in data by leveraging the power of AI in Explain Data, our work in AI only continues to grow now that we’re a part of Salesforce. We recently announced that we are bringing together Tableau with Salesforce’s Einstein Analytics to deliver the best analytics platform out there. This new platform will create even more ways for people to make the most of their data, from improving the quality of insights, to helping them act faster, to enabling smarter data prep and easier sharing. This is just the beginning of our innovations to come with Salesforce as a partner. Additionally, we are even more committed to making analytics accessible for everyone with our initiatives around becoming a data culture, where data is embedded into the identity of the organization. The World Economic Forum just released a report on the future of jobs with the main message that Covid-19 is accelerating the need for companies to scale remote work, speed up automation, and expand digitization. Old jobs will be lost and the newer ones will demand more advanced digital skills, including using data. In fact, the WEF listed the top in-demand job of the future will be for data analysts and scientists. Establishing a data culture is not an overnight process, but it’s a worthwhile and essential one and we hope our work—especially in programs to promote data literacy—can help everyone explore, understand, and communicate with data. All these recent efforts build on what we’ve strived to do since the beginning of Tableau—give people new ways of working with their data. The original VizQL work is still the heart of our product and the work we have done since, including building new data platforms and applying good design principles to create highly engaging products. Everything we work on is to build on our mission to help people see and understand their data. We owe a great deal of thanks to the original groundbreaking work in VizQL that has truly stood the test of time. We’re excited to continue to take that same focus, dedication, and excitement for innovation into the future. Today, as Tableau’s CTO, I’m focused on examining future technologies and product ideas that we can leverage to push our customers’ abilities to work with their data to new heights. And our R&D team remains steadfastly focused on pushing forward with new ideas and how to best turn those into the innovations that will continue to improve Tableau. If you’d like a more in-depth look at our research and development work, please follow our engineering blog.

0
0
5051

Matthew Emerick

22 Sep 2020

1 min read

Announcing: New Power BI experiences in Microsoft Teams from Microsoft Power BI Blog | Microsoft Power BI

Matthew Emerick

22 Sep 2020

1 min read

The way we work is changing dramatically. It’s more connected, collaborative, and often done remotely. Organizations need tools to help everyone infuse data into every decision. We’re excited to announce new Power BI integrations for Microsoft Teams to make it easier to discover and use data within your organization.

0
0
5039

Tech News - Data

Amazon, Facebook and Microsoft announce the general availability of ONNX v0.1

TensorFlow 1.9.0-rc2 releases!

Splunk introduces machine learning capabilities in Splunk Enterprise and Splunk Cloud

Why you should NEVER run a Logistic Regression (unless you have to) from Featured Blog Posts - Data Science Central

Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines from Cloudera Blog

Weekly Digest, October 19 from Featured Blog Posts - Data Science Central

Announcing the Upcoming Evolution of Power BI Premium to enterprise markets and beyond from Microsoft Power BI Blog | Microsoft Power BI

IBM rolls out Deep Learning as a Service (DLaaS) program for AI developers

pandas 0.23 released

See all the Power BI updates at the Microsoft Business Applications Launch Event from Microsoft Power BI Blog | Microsoft Power BI

Trending Topics

AI is Driving Software 2.0… with Minimal Human Intervention from Featured Blog Posts - Data Science Central

What you need to know to begin your journey to CDP from Cloudera Blog

Applications of Machine Learning in FinTech from Featured Blog Posts - Data Science Central

The pace of innovation never rests: How lessons from our past still influence us today from What's New

Announcing: New Power BI experiences in Microsoft Teams from Microsoft Power BI Blog | Microsoft Power BI

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access