Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech News - Data

1209 Articles
article-image-electionviz-us-tv-networks-have-room-for-data-storytelling-improvement-from-whats-new
Anonymous
17 Nov 2020
5 min read
Save for later

#ElectionViz: US TV networks have room for data storytelling improvement from What's New

Anonymous
17 Nov 2020
5 min read
Andy Cotgreave Technical Evangelist Director Tanna Solberg November 17, 2020 - 9:04pm November 18, 2020 Editor’s note: A version of this blog post originally appeared in Nightingale, a publication by the Data Visualization Society. How was US general election night for you? For me, it was underwhelming.  That emotion had nothing to do with the political story, but everything to do with the data storytelling of US TV networks. My goal on election night was to enjoy and comment on the way they told data stories, which you can find on Twitter under #ElectionViz. I came away disappointed. The gulf between the charts we find on news websites and US TV networks is enormous. News websites offer sophisticated experiences, whereas the networks offer—well, not much more than screens dominated by geographical shapes of counties in the US. There is not a great deal of difference between this year’s screens and those from 1968. Let’s take CNN as an example. John King, like all anchors on most networks, are amazing commentators. Their knowledge of the US political landscape and their ability to narrate events are hugely impressive. Unfortunately, their words were not supported by visuals that would have made it easier for an audience to follow along. Orange County, Florida map as seen on CNN at 8pm EDT on November 3, 2020. Almost without fail, when an anchor zooms into a county map, they make three data-driven observations: What is the current split between candidates? How many votes have been counted? How is this different from 2016? Also, once the narrator has zoomed into a county, the shape or location of the county is no longer a primary piece of information to focus on. Given that, how easy is it to answer the three questions the narrator needs to answer? It’s not at all easy.  What if we changed the display to focus on the three questions? It could look something like this: A reimagined screen for CNN.The geography is now just a small thumbnail, alongside a vote count progress bar. The candidate’s vote numbers, instead of a text box, are shown as bars. A slope chart on the right shows the swing from 2016.  These are not complex charts: bars and slopes use the most basic building blocks of data visualization, and yet, in an instant, we can see the information the narrator is describing. All the major networks I followed used the same template: map-driven graphics with little thought to the little things that could have greatly enhanced the stories being told. Steve Kornacki on MSNBC did take full advantage of the sports-style telestration board with extensive use of hand-drawn numbers and circles. These enhanced the visual power of his explanations. Steve Kornacki on MSNBC using annotations to enhance his story.Beyond the maps, I was surprised at how few visualizations the networks created. There was the occasional line chart, including a nice one from NBC. It was well laid out, with clear labeling and an identifiable data source. My only quibble was the positioning of the party annotations. It’s always nice if you can put the category label at the end of the line itself. In any TV coverage, it’s only a matter of time before you see a pie chart of some sort. The first I saw was also on NBC. Take a look at this, and try and decode the pie chart. Pay attention to how many times your eye moves across the chart as you do so: Let me guess, your eyes went on a chaotic path across the chart from legend to segment, to numbers, to legend, and so on. How about if we showed this as a bar chart instead? How long does it take to parse the information now? Which is easier and faster to read? The donut or the bars?As I watched the live feeds of the news websites though the night, it was clear that traditional print media are streets ahead in terms of data storytelling. It’s not because their browser-based graphical displays are complex, or that they appeal to data geeks like me. It’s because they consider the questions audiences have and focus the display on delivering answers as quickly as possible. What seems to be missing is the fundamental goal any data storyteller needs to ask: What are the key questions I need to answer? How can I present the information so that those questions can be answered as easily as possible? On reflection, I was surprised by the information design conservatism in the US TV Networks. Comparing today’s coverage to coverage in 1968, other than the addition of color, the displays are still tables of numbers and the odd map. I did #ElectionViz for the UK General Election in December 2019, and the visualization maturity of Sky News and BBC were far further ahead than that of the US networks. As the dust settles and we move towards 2024, I would love to see a little more visual sophistication to support the amazing anchors.   Ok Twitter! It's 3.30am in the UK and I don't think there are any more new charts the media have up their sleeves. So I'm calling it a night for #ElectionViz. Thank you for following, it's been quite the exprience. For now: a whisky to toast you all: pic.twitter.com/4QjZnkHglu — Andy Cotgreave (@acotgreave) November 4, 2020
Read more
  • 0
  • 0
  • 4979

article-image-nvidia-titan-v-gpu-volta-processor
Abhishek Jha
08 Dec 2017
4 min read
Save for later

Titan V is the “most powerful PC GPU ever created,” Nvidia says

Abhishek Jha
08 Dec 2017
4 min read
Nvidia has developed a certain knack of releasing only the “most powerful” products of late. Never mind. Let’s go beyond the verbiage – they are indeed the market leaders in chip making, after all. Their latest announcement, Titan V, has nine times the power of its predecessor, the $1,200 Titan Xp. As Nvidia CEO Jensen Huang lit up a gathering of hundreds of elite deep learning researchers at the Neural Information Processing Systems AI conference — better known as NIPS — by unveiling TITAN V, it marked quite a new era. The new flagship processor is Nvidia's first HBM2-equipped prosumer graphics card available for the masses. And the chip giant claims Titan V could even transform a PC into an AI supercomputer! [embed width="" height=""]https://www.youtube.com/watch?v=NPrfiOldKf8&feature=youtu.be[/embed] Huang said Titan V was tailormade for “breakthrough discoveries” across high performance computing (HPC) and artificial intelligence. It excels at computational processing for scientific simulation. Its 21.1 billion transistors deliver 110 teraflops of raw horsepower, 9x that of its predecessor, with extreme energy efficiency. “With TITAN V, we are putting Volta into the hands of researchers and scientists all over the world.. We broke new ground with its new processor architecture, instructions, numerical formats, memory architecture and processor links,” he said. Design details Titan V is based on the architecture of Nvidia’s "Volta" GV100 graphics processor. It comes with 12 GB of HBM2 memory across a 3072-bit wide memory interface. The GPU features 5120 Cuda cores and additional 640 tensor cores that have been optimized to speed up machine learning workloads. Titan V’s Volta architecture features a major redesign of the streaming multiprocessor that is at the center of the GPU. It doubles the energy efficiency of the previous generation Pascal design, enabling dramatic boosts in performance in the same power envelope. New Tensor Cores designed specifically for deep learning deliver up to 9x higher peak teraflops. With independent parallel integer and floating-point data paths, Volta is also much more efficient on workloads with a mix of computation and addressing calculations. Its new combined L1 data cache and shared memory unit significantly improve performance while also simplifying programming. Fabricated on a new TSMC 12-nanometer FFN high-performance manufacturing process customized for NVIDIA, TITAN V also incorporates Volta’s highly tuned 12GB HBM2 memory subsystem for advanced memory bandwidth utilization. Reference GeForce Titan V Titan Xp Titan X GTX 1080 GTX 1070 GTX 1060 Die Size (815mm²) (471mm²) (471mm²) GPU GV100 GP102-400-A1 GP102-400-A1 GP104-400-A1 GP104-200-A1 GP106-400-A1 Architecture Volta Pascal Pascal Pascal Pascal Pascal Transistor count 21 Billion 12 Billion 12 Billion 7.2 Billion 7.2 Billion 4.4 Billion Fabrication Node TSMC 12 nm FinFET+ TSMC 16 nm TSMC 16 nm TSMC 16 nm TSMC 16 nm TSMC 16 nm CUDA Cores 5,120 3,840 3,584 2,560 1,920 1,280 SMMs / SMXs 40 30 28 20 15 10 ROPs n/a 96 96 64 64 48 GPU Clock Core 1,200 MHz 1,405 MHz 1,417 MHz 1,607 MHz 1,506 MHz 1,506 MHz GPU Boost clock 1,455 MHz 1582 MHz 1,531 MHz 1,733 MHz 1,683 MHz 1,709 MHz Memory Clock 1700 MHz 2852 MHz 2500 MHz 1,250 MHz 2,000 MHz 2,000 MHz Memory Size 12 12 GB 12 GB 8 GB 8 GB 3 GB / 6 GB Memory Bus 3072-bit 384-bit 384-bit 256-bit 256-bit 192-bit Memory Bandwidth 653 GB/s 547 GB/s 480 GB/s 320 GB/s 256 GB/s 192 GB/s FP Performance 15 TFLOPS 12.0 TFLOPS 11.0 TFLOPS 9.0 TFLOPS 6.45 TFLOPS 4.61 TFLOPS GPU Thermal Threshold 91 Degrees C 97 Degrees C 94 Degrees C 94 Degrees C 94 Degrees C 94 Degrees C TDP 250 Watts 250 Watts 250 Watts 180 Watts 150 Watts 120 Watts Launch MSRP ref $2999 $1200 $1200 $599/$699 $379/$449 $249/$299  Source: guru3D Availability Titan V is ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing. Users of Titan V can gain immediate access to the latest GPU-optimized AI, deep learning and HPC software by signing up at no charge for an NVIDIA GPU Cloud account. Priced at a staggering $2,999, Titan V is available to purchase only from the Nvidia stores in participating countries. One final thing about Titan V. The GPU's color in gold and black looks pretty cool!
Read more
  • 0
  • 0
  • 4924

article-image-tencent-creates-two-artificial-intelligence-agents-tstarbot-that-defeat-starcraft-iis-cheater-ai
Sugandha Lahoti
24 Sep 2018
4 min read
Save for later

Tencent creates two Artificial Intelligence agents TSTARBOT that defeat StarCraft II's cheater AI

Sugandha Lahoti
24 Sep 2018
4 min read
AI researchers have always been super thrilled about building Artificial Intelligence bots that can play games as smartly as a human. Back in June, OpenAI Five, the artificial intelligence bot team, had smashed amateur humans in the video game Dota 2. In August, OpenAI Five bots beat a team of former pros at Dota 2. Now, researchers from Chinese tech giant Tencent have recently developed a pair of AI agents that were successful in beating the Cheating Level Built-in AI in StarCraft II in the full game. Starcraft II is widely considered as the most challenging Real Time Strategy game, due to large observation space, huge action space, partial observation, multi-player simultaneous game model, and long time horizon decision. The two AI Agents Their research paper describes two AI agents - TSTARBOT 1 and TSTARBOT 2. The first is a macro-level controller agent based on deep reinforcement learning over flat action structure. It oversees several specific algorithms designed to handle lower level functions. TSTARBOT 1: Overview of macro actions and reinforcement learning TSTARBOT2, the more robust of the two, is a macro-micro controller consisting of several modules that handle entire facets of the gameplay independently. TSTARBOT 2: Overview of macro-micro hierarchical actions The gameplay Tencent's AI played the game using methods similar to a mouse click and macros and played exactly the same way as a human player would. The AI saw the game by interpreting video output in a frame-to-frame basis, and translating the information into data it can work with. Tencent's AI played StarCraft II with the "fog-of-war" turned on. This means that the AI can’t see the enemy AI's units and base until it scouts the map. The TSTARBOTs were designed to imitate the human thought process.   The agents were tested in a 1v1 Zerg-vs-Zerg full game. They played against built-in AI ranging from level 1 (the easiest) to level 10 (the hardest). The training used the Abyssal Reef, a map known to have thwarted neural network AIs from winning against StarCraft II's built-in AIs. Interestingly, Tencent trained the agents using only a single CPU. This, however, took a large number of processors to process the data it takes to train the bots on billions of frames of video. The researchers took 1920 parallel actors (with 3840 CPUs across 80 machines) to generate the replay transitions, at the speed of about 16,000 frames per second. Results Win-rate (in %) of TSTARBOT 1 and TSTARBOT 2 agents, against built-in AIs of various difficulty levels. Each reported win-rate is obtained by taking the mean of 200 games with different random seeds, where a tie is counted as 0.5 when calculating the win-rate. TSTARBOT1 and TSTARBOT2 also played against several human players ranging from Platinum to Diamond level players in the ranking system of SCII Battle.net League. TSTARBOTs vs. Human Players Each entry means how many games TStarBot1/TStarBot2 wins and loses. The agent is able to consistently defeat built-in AIs in all levels, showing the effectiveness of the hierarchical action modeling. In another informal test, the researchers also let the two TSTARBOT play against each other. TSTARBOT 1 always defeated TSTARBOT 2. This is because TSTARBOT 1 tends to use the Zergling Rush strategy. In StarCraft, a Zerg rush is a strategy where a player using the Zerg race tries to overwhelm the opponent through large numbers of smaller units before the enemy is fully prepared for battle. TSTARBOT 2 lacks anti-rush strategy and henceforth always loses. In the future, the team plans to build a more carefully hand-tuned action hierarchy to enable the reinforcement learning algorithms to develop better strategies for full StarCraft II games. If you want to dive a little deeper into how the bots work, you can read the research paper. AI beats human again – this time in a team-based strategy game. OpenAI Five loses against humans in Dota 2 at The International 2018. OpenAI set their eyes to beat Professional Dota 2 team at The International.
Read more
  • 0
  • 0
  • 4915

article-image-trick-or-a-treat-telegram-announces-its-new-delete-feature-that-deletes-messages-on-both-the-ends
Amrata Joshi
26 Mar 2019
4 min read
Save for later

Trick or a treat: Telegram announces its new ‘delete feature’ that deletes messages on both the ends

Amrata Joshi
26 Mar 2019
4 min read
Just two days ago, Telegram, announced the new ‘delete feature’ that allows users to delete messages in one-to-one and/or group private chats. This latest feature allows users to selectively delete their own messages or the messages sent by others in the chat. To delete the message they don’t even need to compose the original message in the first place. This feature is available in Telegram 5.5. So next time you have a conversation with someone, you can delete all of the chats from your device and from the device on the other hand (with whom you are chatting). For deleting a message from both the ends, a user needs to tap on the message and select delete. After that, the user will be given an option of ‘delete for me’ or for everyone. Pavel Durov, Founder at Telegram justified the need for the feature. He writes, “Relationships start and end, but messaging histories with ex-friends and ex-colleagues remain available forever. It’s getting worse. Within the next few decades, the volume of our private data stored by our chat partners will easily quadruple.” According to him, users should have control of their digital historic conversation. He further added, “An old message you already forgot about can be taken out of context and used against you decades later. A hasty text you sent to a girlfriend in school can come haunt you in 2030 when you decide to run for mayor. We have to admit: Despite all of our progress in encryption and privacy, we have very little actual control of our data. We can’t go back in time and erase things for other people.” Telegram’s ‘delete’ feature repercussions This might sound like an exciting feature but what lies before us is the bigger question ‘Will the feature get misused?’ The next time if someone bullies a user on Telegram chat or sends something abusive or absurd. The victim probably won’t even have the proof to show it to others in the case, the attacker deletes them. Moreover, if a group chat involves a series of conversation and a user maliciously deletes a few messages, other users from the group won’t come to know. This way, the conversation might get misinterpreted and the flow of the conversation would get disturbed. The conversation might more look like a manipulated one and cause more trouble. The traces available for criminals or attackers over this platform will get wiped off. It is giving control to the users but secretly opening the ways for malicious attacks. WhatsApp’s unsend message feature seems better in this regard because it lets users delete their own messages and which also sounds legit. Also, when the message is deleted, users in the group chat or in the private chat get notified about it unlike how it works in Telegram. The feature could also cause trouble in case the user accidentally end up deleting someone else’s message in the group or in private chat as the deleted message can’t even get recalled. While talking about the misuse, Durov writes, “We know some people may get concerned about the potential misuse of this feature or about the permanence of their chat histories. We thought carefully through those issues, but we think the benefit of having control over your own digital footprint should be paramount.” Few users are happy with this news and think that this feature can save them in case they accidentally share something sensitive. A user commented on HackerNews, “As a person who accidentally posted sensitive info on chats, I welcome this feature. I do wish they implemented an indication "message deleted" in the chat to show that the editing took place.” Few others think that this feature could cause major trouble. Another user commented, “The problem I see with this is that it adds the ability to alter history. You can effectively remove messages sent by the other person, altering context and changing the meaning of the conversation. You can also remove evidence of transactions, work or anything else. I get that this is supposed to be a benefit, but it's also a very significant issue, especially where business is concerned.” To know more, check out Telegram’s blog post. Messaging app Telegram’s updated Privacy Policy is an open challenge The Indian government proposes to censor social media content and monitor WhatsApp messages Facebook hires top EEF lawyer and Facebook critic as Whatsapp privacy policy manager
Read more
  • 0
  • 0
  • 4900

article-image-cockroachdb-2-0-is-out
Sunith Shetty
05 Apr 2018
2 min read
Save for later

CockroachDB 2.0 is out!

Sunith Shetty
05 Apr 2018
2 min read
CockroachDB has announced a new version 2.0 with notable features in their armory. This breakthrough version has brought them one step closer to making data accessible to everyone. CockroachDB is an open source, cloud-native SQL database, which allows you to build global, large, and resilient cloud applications. They automatically scale, recover and repair things allowing the database to survive critical disasters. It has an excellent support with popular orchestration tools such as Kubernetes and Mesosphere DC/OS to simplify and automate operations.     Some of the noteworthy changes available in CockroachDB 2.0 : Re-adjustment to customer’s changing requirements: CockroachDB 2.0 support for JSON has bought more flexibility and consistency. You will be able to handle both structured and semi-structured data, thus allowing you to use multiple data models within the same database. Better project handling to cope up with changing customer requirements and rapid prototyping for large-scale systems. Now you can perform in-place transactions and inverted indices to accelerate queries on large volumes of data using CockroachDB 2.0’s Postgres-compatible JSON. Performance and scalability Improvements: Developers prefer an agile methodology while building real-world applications. CockroachDB 2.0 offers better scalability standards and performance measures to deal with increasing amount of data and application needs. New operators to handle growing amount of user request volume with ease. Managing multi-regional workloads: CockroachDB 2.0 has bolstered their efficiency in managing multi-regional data to deliver low latency applications New cluster dashboard helps you visualize globally distributed clusters. This means you can keep a close watch on performance bottlenecks and stability problems. You can now perform excellent customer service by adapting to multi-regional needs. Now you can bind the data to respective customers in the data centers in that same region using a compelling new feature called Geo-partitioning. For the full list of updates, you can refer the release notes.
Read more
  • 0
  • 0
  • 4895

article-image-how-to-analyze-salesforce-service-cloud-data-smarter-with-tableau-dashboard-starters-from-whats-new
Matthew Emerick
13 Oct 2020
5 min read
Save for later

How to analyze Salesforce Service Cloud data smarter with Tableau Dashboard Starters from What's New

Matthew Emerick
13 Oct 2020
5 min read
Boris Busov Solution Engineer Maddie Rawding Solution Engineer Tanna Solberg October 13, 2020 - 4:45pm October 13, 2020 The key to building a customer-focused organization is effective customer service. With every touchpoint, there are opportunities to increase operational efficiency and productivity, improve customer satisfaction, and build customer loyalty. High-performing service teams are 1.6 times more likely to use analytics to improve service. However, there are many pain points to getting started: there’s a wealth of data available coming from a variety of tools, traditional governance models prevent users from accessing data, and on top of everything it can be hard to find insights in complex data. The result is that customer service teams lack direction on how to improve and make their customers happy.  Every department in an organization should be able to understand their data—and customer service organizations are no exception—which is why we’re excited to add the Service Overview and the Case Tracking dashboards to our collection of starters. These two Dashboard Starters are specifically made for the Salesforce Service Cloud and are a great launching pad for anyone introducing analytics to their service organization. Salesforce puts customer experience at the center of every conversation, and now, you can use the power of Tableau’s new Dashboard Starters to discover insights and make data-driven decisions in your service organization.  Getting started with Service Cloud Dashboard Starters All of our Dashboard Starters are available on Tableau Online—simply create a new workbook and connect to Dashboard Starters when you’re building a workbook in Tableau Online (to learn how, follow the steps in this Help article). For Service Cloud, select and open the Service Overview and Case Tracking starters. If you don’t have Tableau Online, you can start a free trial. Alternatively, you can download the Dashboard starters from our website. We have a whole collection of Salesforce Dashboard Starters available for you to try. Service Overview Dashboard Starter Use the Service Overview dashboard to get a high-level rundown of your business across important metrics like CSAT, number of cases, response time, and SLA compliance. Select a metric at the top to filter all of the views on the dashboard and then drill into cases by selecting individual marks on a view. Figure 1: Monitor and drill into key performance metrics with the Service Overview dashboard. With the Service Overview dashboard you can come to a consensus on what good customer service looks like in your organization. Each metric has a customizable target on the dashboard that can be used to set benchmarks for your organization and alerts can be set on Tableau Online for users to get notified. Filter to see information for different time periods, geographies, and more. Figure 2: Set target goals to deliver great service.   Case Tracking Dashboard Starter The Case Tracking dashboard allows agents to monitor their case queue and performance over time. Filter the dashboard to an individual agent and then drill into trends over time to discover potential opportunities for improvement. Figure 3: Explore performance by agent and monitor trends with the Case Tracking dashboard.The Case Tracking dashboard also allows you to drill into case details. Add in your Salesforce URL (make sure the parameter is inputted correctly) and return to the dashboard. Use the arrow on the case details worksheet to jump directly into the case in Salesforce. Figure 4: Drill into case details and then head to Salesforce to take action.  Sharing and customizing the Dashboard Starters These Service Cloud starters are meant to be a starting point. The possibilities are limitless! You can do anything from: Publish your starters and then set alerts and subscriptions to share with your teams.  Add data and create visualizations from other data sources from important source systems to enrich your analysis. Create new KPIs, build custom calculations, and modify the starters to match how your organization provides service. Use custom colors to match your organization's branding. Plugging your own data into the Dashboard Starter These starters use sample data. If you want to add your own data you will need to connect to your Salesforce instance.  Select the Data Source tab. A dialog box will appear prompting you for your application credentials (i.e. Salesforce username and password). Enter your credentials and log in to your account. You’ll need to ensure your account has API access to your Salesforce instance with your Salesforce admin. Now, go back to the dashboard. Tableau Desktop will then create an extract of your data. This process will vary based on how much data you have in your Salesforce instance. If any worksheets appear blank, navigate to the blank worksheet. Replace reference fields by right-clicking on the fields with red exclamation marks as necessary. Making the most of the new Service Overview and Case Tracking Dashboard Starters can organize and analyze the wealth of data gained from every customer interaction. Being able to elevate insights empowers service teams to take action, resulting in lower call volumes, faster resolution times, and improved workflows. From the release of the Tableau Viz Lightning Web Component to the enhancements in Tableau’s connector to Salesforce, there’s never been a better time to start analyzing your data in Tableau and these Dashboard Starters are just the beginning of what is to come with Tableau and Salesforce. Additional resources: Connect to Salesforce data in Tableau Documentation on Dashboard Starters for cloud-based data Overview of Tableau Dashboard Starters Tableau Viz Lightning Web Component What is Salesforce Service Cloud? Tableau resources for customer service teams
Read more
  • 0
  • 0
  • 4862
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-project-hydrogen-making-apache-spark-play-nice-with-other-distributed-machine-learning-frameworks
Sunith Shetty
06 Jun 2018
5 min read
Save for later

Project Hydrogen: Making Apache Spark play nice with other distributed machine learning frameworks

Sunith Shetty
06 Jun 2018
5 min read
Apache Spark team has revealed a new venture during a keynote at Spark AI Summit called Project Hydrogen. This new project focuses on eliminating the obstacles faced by organizations from using Spark with various deep learning frameworks such as TensorFlow and MxNet. The rise of Apache Spark is quite evident from the fact it is one of the highly accepted platforms for big data processing even outperforming other big data frameworks like Hadoop. It has shown a significant growth in the big data field. Due to its excellent functionalities and services, Apache Spark is one of the most used big data unified framework for carrying out data processing, SQL querying, real-time streaming analytics, and machine learning. If you want to understand why Apache Spark is gaining popularity, you can check out our interview with Romeo Kienzler, Chief Data Scientist in the IBM Watson IoT worldwide team. What are the current limitations of Apache Spark? Apache Spark works fine when you want to work in the big data field. However, the power of Spark’s single framework breaks down when one tries to use other third-party distributed machine learning or deep learning frameworks. Apache Spark has its own machine learning library called Spark MLlib, which provides noteworthy machine learning functionalities. However looking at the rate of development and research in the machine learning and artificial intelligence domain, data scientists and machine learning practitioners want to explore the power of leading deep learning frameworks such as TensorFlow, Keras, MxNet, Caffe2, and more. The problem is, Apache Spark and deep learning frameworks don’t play well together. With growing requirement and advanced tasks, Spark users do want to combine Spark together with those frameworks in order to handle complex functionalities. However, the main problem is the incompatibility between the way how Spark scheduler works and other machine learning frameworks works. Do we have any in-house solutions? Basically, there are two possible options for combining Spark with other deep learning frameworks, Option 1 We will need to use two different clusters to carry out individual work. Source: Databricks - Spark AI Summit 2018 As you can see in the preceding image, we have two clusters. All the data processing work which includes data prep, data cleansing and more can be performed in the Spark cluster, the final result is shared to a storage repository (HDFS or S3). The second cluster which is running the distributed machine learning framework can read the data stored in the repository, This architecture no more follows a unified nature. One of the core challenges faced is handling these two disparate systems separately since you need to understand how each system work. Each cluster might follow different debugging schemes, different log files, thus making it very difficult to operate. Option 2 Some users have tried to tackle all the challenges faced in option 1 such as operational difficulties, debugging, testing challenges and more by implementing option 2. As you can see in the following image, here we have one cluster that runs both Spark and distributed machine learning frameworks. However, the result is not so convincing. The main problem with this approach is the inconsistency between how both systems work. There is a great difference between how Spark tasks are scheduled and how deep learning tasks are scheduled. In Spark environment, each job is divided into a number of subtasks that are independent of each other. However, deep learning frameworks use different scheduling schemes. Based on the job, they either use MPI or their own custom RPCs for doing communication. Here they assume complete coordination and dependency among their set of tasks. Source: Databricks - Spark AI Summit 2018 You can see clear signs of this approach when the tasks fail. For example, as shown in the following figure, in the Spark model, when any task fails, the Spark scheduler simply restarts the single task, and thus the entire job is fully recovered. However, in case of deep learning frameworks, because of complete dependency if any of the tasks fails all the tasks need to be launched again. Source: Databricks - Spark AI Summit 2018 The Solution: Project Hydrogen Project Hydrogen aims to solve all the challenges faced while using Spark and other deep learning frameworks together. It is positioned as a potential solution allowing all the data scientists to plug Spark with other deep learning frameworks. This project uses a new scheduling primitive called Gang scheduler. This primitive addresses the dependencies challenge introduced by the deep learning schedulers as shown in option 2. Source: Databricks - Spark AI Summit 2018 In gang scheduling, it has to schedule all or nothing which means it schedule all the tasks in one go or none of the tasks are scheduled at all. This measure will successfully handle the disparity between how both systems work. What’s next? Project Hydrogen API is not ready yet. We can expect them to be added to the core Apache Spark project later this year. The primary goal of this project is to embrace all distributed machine learning frameworks in the Spark ecosystem. Thus allowing every other framework to run as smoothly as Apache Spark’s machine learning library MLlib. Along with Spark support for deep learning frameworks, they are also working on speeding up the data exchanges, which often becomes a potential bottleneck while doing machine learning and deep learning tasks. In order to comfortably use FPGA or GPUs in your latest clusters, Spark is working closely with accelerators. Read more Apache Spark 2.3 now has native Kubernetes support! How to win Kaggle competition with Apache SparkML How to build a cold-start friendly content-based recommender using Apache Spark SQL
Read more
  • 0
  • 0
  • 4862

Matthew Emerick
13 Oct 2020
4 min read
Save for later

Happy Birthday, CDP Public Cloud from Cloudera Blog

Matthew Emerick
13 Oct 2020
4 min read
On September 24, 2019, Cloudera launched CDP Public Cloud (CDP-PC) as the first step in delivering the industry’s first Enterprise Data Cloud. That Was Then In the beginning, CDP ran only on AWS with a set of services that supported a handful of use cases and workload types: CDP Data Warehouse: a kubernetes-based service that allows business analysts to deploy data warehouses with secure, self-service access to enterprise data CDP Machine Learning: a kubernetes-based service that allows data scientists to deploy collaborative workspaces with secure, self-service access to enterprise data. CDP Data Hub:  a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data.  At the heart of CDP is SDX, a unified context layer for governance and security, that makes it easy to create a secure data lake and run workloads that address all stages of your data lifecycle (collect, enrich, report, serve and predict). This is Now With CDP-PC just a bit over a year old, we thought now would be a good time to reflect how far we have come since then.  Over the past year,  we’ve not only added Azure as a supported cloud platform, but we have improved the original services while growing the CDP-PC family significantly: Improved Services Data Warehouse – in addition to a number of performance optimizations, DW has added a number of new features for better scalability, monitoring and reliability to enable self-service access with security and performance  Machine Learning – has grown from a collaborative workbench to an end-to-end Production ML platform that enables data scientists to deploy a model or an application to production in minutes with production-level monitoring, governance and performance tracking. Data Hub – has expanded to support all stages of the data lifecycle: Collect – Flow Management (Apache NiFi), Streams Management (Apache Kafka) and Streaming Analytics (Apache Flink) Enrich – Data Engineering (Apache Spark and Apache Hive) Report – Data Engineering (Hive3), Data Mart (Apache Impala) and Real-Time Data Mart  (Apache Impala with Apache Kudu)  Serve – Operational Database (Apache HBASE), Data Exploration (Apache Solr)  Predict – Data Engineering (Apache Spark) New Services CDP Data Engineering (1) – a service purpose-built for data engineers focused on deploying and orchestrating data transformation using Spark at scale.  Behind the scenes, CDE leverages kubernetes to provide isolation and autoscaling as well as providing a comprehensive toolset to streamline ETL processes – including orchestration automation, pipeline monitoring and visual troubleshooting CDP Operational Database (2) – an autonomous, multimodal, autoscaling database environment supporting both NoSQL and SQL.  Under the covers, Operational Database leverages HBASE and allows end users to create databases without having to worry about infrastructure requirements  Data Visualization (3) – an insight and visualization tool, pre-integrated with Data Warehouse and Machine Learning, that simplifies sharing analytics and information among data teams    Replication Manager – makes it easy to copy or migrate unstructured (HDFS) or structured (Hive) data from on-premise clusters to CDP environments running in the Public Cloud  Workload Manager –  provides in-depth insights into workloads that can be used for troubleshooting failed jobs and optimizing slow workloads  Data Catalog – enables data stewards to organize and curate data assets globally, understand where relevant data is located, and audit how it is created, modified, secured and protected  Each of the above is integrated with SDX, ensuring a consistent mechanism for authentication, authorization, governance and management of data, regardless of where you access your data from and how you consume it.  Behind these new features is a support cast of many issues resolved, tweaks made and improvements added by a cast of hundreds of people to improve performance, scalability, reliability, usability and security of CDP Public Cloud. And We Are Not Done And that was just the first 12 months. Our roadmap includes a number of exciting new features and enhancements to build on our vision of helping you: Do Cloud Better: Deliver cloud-native analytics to the business in a secure, cost-efficient, and scalable manner. Enable Cloud Everywhere: Accelerate adoption of cloud-native data services for public clouds  Optimize the Data Lifecycle: Collect, enrich, report, serve, and model enterprise data for any business use case in any cloud. Learn More, Keep in Touch We invite you to learn more about CDP Public Cloud for yourself by watching a product demo  or by taking the platform for a test drive (it’s free to get started).  Keep up with what’s new in CDP-PC by following our monthly release summaries.  (1) Currently available on AWS only (2) Technical Preview on AWS and Azure (3)  Data Visualization is in Tech Preview on AWS and Azure The post Happy Birthday, CDP Public Cloud appeared first on Cloudera Blog.
Read more
  • 0
  • 0
  • 4837

article-image-introducing-deep-tabnine-a-language-agnostic-autocompleter-based-on-openais-gpt-2
Bhagyashree R
23 Jul 2019
3 min read
Save for later

Introducing Deep TabNine, a language-agnostic autocompleter based on OpenAI’s GPT-2

Bhagyashree R
23 Jul 2019
3 min read
TabNine is a language-agnostic autocompleter that leverages machine learning to provide responsive, reliable, and relevant code suggestions. In a blog post shared last week, Jacob Jackson, TabNine's creator, introduced Deep TabNine that uses deep learning to significantly improve suggestion quality. What is Deep TabNine? Deep TabNine is based on OpenAI's GPT-2 model that uses the Transformer architecture. This architecture was intended for solving problems in natural language processing, Deep TabNine uses it to understand the English in code. For instance, the model can negate words with an if/else statement. While training, the model's goal is to predict the next token given the tokens that come before it. Trained on nearly 2 million files from GitHub, Deep TabNine comes with pre-existing knowledge, instead of learning only from a user’s current project. Additionally, the model also refers to documentation written in natural language to infer function names, parameters, and return types. It is capable of using small clues that are difficult for a traditional tool to access. For instance, it understands that the return type of app.get_user() is assumed to be an object with setter methods and the return type of app.get_users() is assumed to be a list. How can you access Deep TabNine? Although integrating a deep learning model comes with several benefits, using it demands a lot of computing power. Jackson clearly mentioned that running it on a laptop will not deliver low latency that TabNine's users are accustomed to.  As a solution, they are offering TabNine Cloud (Beta), a service that will enable users to use TabNine's servers for GPU-accelerated autocompletion. To get access to TabNine Cloud, you can sign up here. However, there are many who prefer to keep their code on their machines. To ensure the privacy and security of your code, the TabNine team is working on the following use cases: They are promising to come up with a reduced-size model in the future that can run on a laptop with reasonable latency for individual developers. Enterprises will have an option to license the model and run it on their hardware. They are also offering to train a custom model that will understand the unique patterns and style specific to an enterprise's codebase. Developers have already started its beta testing and are quite impressed: https://twitter.com/karpathy/status/1151887984691576833 https://twitter.com/aruslan/status/1151914744053297152 https://twitter.com/Frenck/status/1152634220872916996 You can check out the official announcement by TabNine to know more in detail. Implementing autocompletion in a React Material UI application [Tutorial] Material-UI v4 releases with CSS specificity, Classes boilerplate, migration to Typescript and more Conda 4.6.0 released with support for more shells, better interoperability among others
Read more
  • 0
  • 0
  • 4807

article-image-torrent-paradise-uses-ipfs-for-decentralization-possible-alternative-to-pirate-bay
Melisha Dsouza
22 Jan 2019
2 min read
Save for later

Torrent-Paradise uses IPFS for decentralization, possible alternative to Pirate Bay

Melisha Dsouza
22 Jan 2019
2 min read
A developer knows by the handle ‘Urban Guacamole’ has launched a new version of the Torrent-Paradise, powered with IPFS (Interplanetary File System) that provides decentralized torrent searching. This is in contrary to Pirate Bay that has a centralized nature and has been suffering from regular downtimes. The system works very similar to BitTorrent and makes it possible to download files without the need for a central host. Even though the BitTorrent protocol has a decentralized nature, TorrentFreak(TF) states that the ecosystem surrounding it has some weak spots. Torrent sites that use centralized search engines face outages and takedowns thus disrupting service to users. In a statement to TF, Urban says: “I feel like decentralizing search is the natural next step in the evolution of the torrent ecosystem. File sharing keeps moving in the direction of more and more decentralization, eliminating one single point of failure after another”. Urban further explains that each update of Torrent Paradise is an IPFS hash, so the site is always available as long as someone is seeding it even if the servers are down. Decentralization will help search results to be shared between large numbers of systems. This will help the performance of the site as well as improve stability and privacy. According to betanews, by using IPFS, Torrent-Paradise will free itself from the risk of servers going down and also become resistant to blocking and censorship. A few issues of using IPFS as highlighted in TF is that, it needs to be installed and configured for the server to become a node. Also, IPFS gateways like Cloudflare can allow anyone to access sites such as Torrent-Paradise through a custom URL, however, this doesn't help sharing the site. Another issue is that the site relies on a static index which is only updated once a day rather than being updated in near real-time. The regular Torrent-Paradise website is still accessible to all along with the new ad- free  IPFS version. Torrent-Paradise can possibly be an alternative to Pirate Bay? We will leave that open for discussion! Head over to Torrentfreak for more insights on this news. BitTorrent’s traffic surges as the number of streaming services explode MIDI 2.0 prototyping in the works, 35 years after launch of the first version Hyatt Hotels launches public bug bounty program with HackerOne
Read more
  • 0
  • 0
  • 4785
article-image-introducing-a-new-way-to-bring-tableau-analytics-into-salesforce-from-whats-new
Matthew Emerick
18 Sep 2020
4 min read
Save for later

Introducing a new way to bring Tableau analytics into Salesforce from What's New

Matthew Emerick
18 Sep 2020
4 min read
Geraldine Zanolli Developer Evangelist Spencer Czapiewski September 18, 2020 - 10:15pm September 21, 2020 At Tableau, we believe that our customers need analytics in their workflows, and Salesforce customers are no exception. While there are existing ways for our customers to embed Tableau content inside Salesforce, the new Tableau Viz Lightning web component makes it easy to integrate Tableau visualizations into Salesforce in just a few clicks. Today, we are excited to release the Tableau Viz Lightning web component, available now on Salesforce AppExchange. By using it, any Salesforce admin or developer can integrate any Tableau dashboard into a Salesforce Lightning page. You may have already seen the component used in Work.com, Salesforce’s offering of advisory services and technology solutions to help companies and communities safely reopen in the COVID-19 environment. The Work.com team used the Tableau Viz Lightning web component to add the Global COVID-19 Tracker dashboard to the Workplace Command Center, a single source of truth that gives organizations a 360-degree view of return-to-work readiness across all of their locations, employees, and visitors. “Surfacing Tableau dashboards in the Command Center illustrates the power and convenience of the ‘single pane of glass,’” Xander Mitman, Director of Product Management at Salesforce shared. “Best of all, if customers want to add more Tableau dashboards—either public or proprietary—it only takes a few clicks to make those changes. The Tableau Viz Lightning web component makes it fast and easy for business technology teams to take an agile approach to figure out what makes end users most efficient and productive.” The Work.com team used the Tableau Viz Lightning web component to add the Global COVID-19 Tracker dashboard Easy embedding for Salesforce Any Salesforce user can visit AppExchange and install the Tableau Viz Lightning web component in their org. With three clicks, the Lightning web component is ready to be used in Salesforce. Then, Salesforce admins can drag and drop the Lightning web component on a page. Users will need to get the URL of the visualization they want to embed from Tableau Online, Server, or Public, and then customize the look and feel by adjusting the height or showing the Tableau toolbar. Furthermore, to keep users in their workflow, two filtering options are available on Record Pages (such as an Account or Opportunity page): Context filtering allows users to filter the visualization based on the record they are on at the moment. Advanced filtering lets users define their own filter based on the visualization they are embedding and the information on the page. To learn more about how to configure the Tableau Viz Lightning web component, check out Embed Tableau Views in Salesforce in Help. In the same spirit of making the user experience easier, we also released new help articles on setting up single sign-on (SSO) for the Tableau Viz Lightning web component, which currently supports SAML. For our fully native and deeply integrated analytics solution for Salesforce, check out Einstein Analytics. Developers, build your own solution on top of the Tableau Viz Lightning web component Each deployment of Tableau + Salesforce is different—different content, consumers, use cases, etc. We recognize that the Tableau Viz Lightning web component isn't a one-size-fits-all solution, and that is why you can access the full Lightning web component as an open-source project. Developers can build on top of our Tableau Viz Lightning web component by embedding it in their own Lightning web component. One advantage of using composition to build a component is that developers can benefit from the improvements we make to the Tableau Viz Lightning web component without having to change their code. We released the Tableau Viz Lightning web component with one sample code available on GitHub—look for more coming soon. Install the Tableau Viz Lightning web component from AppExchange to get Tableau inside Salesforce today!
Read more
  • 0
  • 0
  • 4753

Matthew Emerick
12 Oct 2020
3 min read
Save for later

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds from Cloudera Blog

Matthew Emerick
12 Oct 2020
3 min read
We are thrilled to announce that Cloudera has acquired Eventador, a provider of cloud-native services for enterprise-grade stream processing. Eventador, based in Austin, TX, was founded by Erik Beebe and Kenny Gorman in 2016 to address a fundamental business problem – make it simpler to build streaming applications built on real-time data. This typically involved a lot of coding with Java, Scala or similar technologies. Eventador simplifies the process by allowing users to use SQL to query streams of real-time data without implementing complex code. We believe Eventador will accelerate innovation in our Cloudera DataFlow streaming platform and deliver more business value to our customers in their real-time analytics applications. The DataFlow platform has established a leading position in the data streaming market by unlocking the combined value and synergies of Apache NiFi, Apache Kafka and Apache Flink. We recently delivered all three of these streaming capabilities as cloud services through Cloudera Data Platform (CDP) Data Hub on AWS and Azure. We are especially proud to help grow Flink, the software, as well as the Flink community.  The next evolution of our data streaming platform is to deliver a seamless cloud-native DataFlow experience where users can focus on creating simple data pipelines that help ingest data from any streaming source, scale the data management with topics, and generate real-time insights by processing the data on the pipeline with an easy-to-use interface. Our primary design principles are self-service, simplicity and hybrid. And, like all CDP data management and analytic cloud services, DataFlow will offer a consistent user experience on public and private clouds – for real hybrid cloud data streaming.  The Eventador technology’s ability to simplify access to real-time data with SQL, and their expertise in managed service offerings will accelerate our DataFlow experience timelines and make DataFlow a richer streaming data platform that can address a broader range of business use cases.  With the addition of Eventador we can deliver more customer value for real-time analytics use cases including: Inventory optimization, predictive maintenance and a wide variety of IoT use cases for operations teams.  Personalized promotions and customer 360 use cases for sales and marketing teams. Risk management and real-time fraud analysis for IT and finance teams. To summarize, the addition of the Eventador technology and team to Cloudera will enable our customers to democratize cross-organizational access to real-time data. We encourage you to come with us on this journey as we continue to innovate the data streaming capabilities within the Cloudera Data Platform as part of the DataFlow experience. We are excited about what the future holds and we warmly welcome the Eventador team into Cloudera. Stay tuned for more product updates coming soon! The post Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds appeared first on Cloudera Blog.
Read more
  • 0
  • 0
  • 4732

article-image-how-to-index-data-from-s3-via-nifi-using-cdp-data-hubs-from-cloudera-blog
Matthew Emerick
15 Oct 2020
10 min read
Save for later

How-to: Index Data from S3 via NiFi Using CDP Data Hubs from Cloudera Blog

Matthew Emerick
15 Oct 2020
10 min read
About this Blog Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was in the previous blog but the ingest pipeline differs. Spark as the ingest pipeline tool for Search (i.e. Solr) is most commonly used for batch indexing data residing in cloud storage, or if you want to do heavy transformations of the data as a pre-step before sending it to indexing for easy exploration. NiFi (as depicted in this blog) is used for real time and often voluminous incoming event streams that need to be explorable (e.g. logs, twitter feeds, file appends etc). Our ambition is not to use any terminal or a single shell command to achieve this. We have a UI tool for every step we need to take.  Assumptions The prerequisites to pull this feat are pretty similar to the ones in our previous blog post, minus the command line access: You have a CDP account already and have power user or admin rights for the environment in which you plan to spin up the services. If you do not have a CDP AWS account, please contact your favorite Cloudera representative, or sign up for a CDP trial here. You have environments and identities mapped and configured. More explicitly, all you need is to have the mapping of the CDP User to an AWS Role which grants access to the specific S3 bucket you want to read from (and write to). You have a workload (FreeIPA) password already set. You have  DDE and  Flow Management Data Hub clusters running in your environment. You can also find more information about using templates in CDP Data Hub here. You have AWS credentials to be able to access an S3 bucket from Nifi. Here is documentation on how to acquire AWS credentials and how to create a bucket and upload files to it. You have a sample file in an S3 bucket that is accessible for your CDP user.  If you don’t have a sample file, here is a link to the one we used. Note: the workflow discussed in this blog was written with the linked ‘films.csv’ file in mind. If you use a different one, you might need to do things slightly differently, e.g. when creating the Solr collection) Pro Tip for the novice user: to download a CSV file from GitHub, view it by clicking the RAW button and then use the Save As option in the browser File menu. Workflow To replicate what we did, you need to do the following: Create a collection using Hue. Build a dataflow in NiFi. Run the NiFi flow. Check if everything went well NiFi logs and see the indexed data on Hue. Create a collection using Hue You can create a collection using the solrctrl CLI. Here we chose to use HUE in the DDE Data Hub cluster: 1.In the Services section of the DDE cluster details page, click the Hue shortcut. 2. On the Hue webUI select Indexes> + ‘Create index’ > from the Type drop down select ‘Manually’> Click Next. 3. Provide a collection Name under Destination (in this example, we named it ‘solr-nifi-demo’). 4. Add the following  Fields, using the + Add Field button: Name Type name text_general initial_release_date date 5. Click Submit. 6. To check that the collection has indeed been created, go to the Solr webUI by clicking the Solr Server shortcut on the DDE cluster details page. 7. Once there, you can either click on the Collections sidebar option or click Select an option > in the drop down you will find the collection you have just created (‘solr-nifi-demo’ in our example) > click the collection > click Query > Execute Query. You should get something very similar: {  "responseHeader":{    "zkConnected":true,    "status":0,    "QTime":0,    "params":{      "q":"*:*",      "doAs":"<querying user>",      "_forwardedCount":"1",      "_":"1599835760799"}},  "response":{"numFound":0,"start":0,"docs":[]   }} That is, you have successfully created an empty collection. Build a flow in NiFi Once you are done with collection creation, move over to Flow management Data Hub cluster. In the Services section of the Flow Management cluster details page, click the NiFi shortcut. Add processors Start adding processors by dragging the ‘Processor’ button to the NiFi canvas. To build the example workflow we did, add the following processors: 1. ListS3 This processor reads the content of the S3 bucket linked to your environment. Configuration: Config name Config value Comments Name Check for new Input Optional Bucket nifi-solr-demo The S3 bucket where you uploaded your sample file Access Key ID <my access key> This value is generated for AWS users. You may generate and download a new one from AWS Management Console > Services > IAM > Users > Select your user > Security credentials > Create access key. Secret Access Key <my secret access key> This value is generated for AWS users, together with the Access Key ID. Prefix input-data/ The folder inside the bucket where the input CSV is located. Be careful of the “/” at the end. It is required to make this work. You may need to fill in or change additional properties beside these such as region, scheduling etc. (Based on your preferences and your AWS configuration) 2. RouteOnAttribute This processor filters objects read in the previous step, and makes sure only CSV files reach the next processor. Configuration: Config name Config value Comments Name Filter CSVs Optional csv_file ${filename:toUpper():endsWith(‘CSV’)} This attribute is added with the ‘Add Property’ option. The routing will be based on this property. See in the connections section. 3.  FetchS3Object FetchS3 object reads the content of the CSV files it receives. Configuration Config name Config value Comments Name Fetch CSV from S3 Optional Bucket nifi-solr-demo The same as provided for the ListS3 processor Object Key ${filename} It’s coming from the Flow File Access Key ID <My Access Key Id> The same as provided for the ListS3 processor Secret Access Key <My Secret Access Key> The same as provided for the ListS3 processor The values for Bucket, Access Key, and Secret Key are the same as in case of the List3 processor. The Object key is autofilled by NiFi, It comes as an input from the previous processors. 4. PutSolrContentStream Configuration Config name Config value Comments Name Index Data to DDE Optional Solr Type Cloud We will provide ZK ensemble as Solr location so this is required to be set to Cloud. Solr Location <ZK_ENSEMBLE> You find this value on the Dashboard of the Solr webUI, as the zkHost parameter value. Collection solr-nifi-demo-collection Here we use the collection which has been created above. If you specified a different name there then put the same here. Content Stream Path /update Be careful of the leading “/”. Content-Type application/csv Any content type that Solr can process may be provided here. In this example we use CSV. Kerberos principal <my kerberos username> Since we use direct URL to Solr, Kerberos authentication needs to be used here. Kerberos password <my kerberos password> Password for the Kerberos principal. SSL Context Service Default NiFi SSL Context Service Just choose it from the drop down. The service is created by default from the Workflow Management template. 5. LogMessage (x4) We created four LogMessage processors too to track if everything happens as expected. a) Log Check Log message Object checked out: ${filename} b) Log Ignore Log message File is not csv. Ignored: ${filename} c) Log Fetch Log message Object fetched: ${filename} d) Log Index Log message Data indexed from: ${filename} 6. In this workflow, the log processors are the dead ends, so pick the “Automatically Terminate Relationships” option on them like this: In this example, all properties not mentioned above were left with their default values during processor setup. Depending on your AWS and environment setup, you may need to set things differently.  After setting up the processors you shall see something like this: Create connections Use your mouse to create flow between the processors. The connections between the boxes are the successful paths, except for the RouteOnAttribute processor: It has the csv_file and the unmatched routes. The FetchS3Object and the PutSolrContentStream processors have failure paths as well: direct them back to themselves, creating a retry mechanism on failure. This may not be the most sophisticated, but it serves its purpose.  This is what your flow will look like after setting the connections: Run the NiFi Flow You may start the processors one by one, or you may start the entire flow at once. If no processor is selected, by clicking the “Play” icon on the left side in the NiFi Operate Palette starts the flow. If you did the setup exactly as it is in the beginning of this post, two object are almost instantly checked out (depending, of course, on your scheduling settings if you set those too):  input-data/ – The input folder also matches with the prefix provided for the ListS3 processor. But no worries, as in the next step it will be filtered out so it won’t go further as it’s not a CSV file. films.csv – this goes to our collection if you did everything right. After starting your flow the ListS3 command based on the scheduling polls your S3 bucket and searches for changes based on the “Last modified” timestamp. So if you put something new in your input-data folder it will be automatically processed. Also if a file changes it’s rechecked too. Check the results After the CSV has been processed, you can check your logs and collection for the expected result. Logs 1. In the Services section of the Flow Management cluster details page, click the Cloudera Manager shortcut. 2. Click on the name of your compute cluster >Click NiFi in the Compute  Cluster box. > Under Status Summary  click NiFi Node  > Click on one of the nodes and click Log Files in the top menu bar. > Select Role Log File. If everything went well you will see similar log messages: Indexed data Indexed data appears in our collection. Here is what you should see on Hue:  Summary In this post, we demonstrated how Cloudera Data Platform components can collaborate with each other, while still being resource isolated and managed separately. We created a Solr collection via Hue, built a data ingest workflow in NiFi to connect our S3 bucket with Solr, and in the end, we have the indexed data ready for searching. There is no terminal magic in this scenario, we’ve only used comfortable UI features. Having our indexing flow and our Solr sitting in separate clusters, we have more options in areas like scalability, the flexibility of routing, and decorating data pipelines for multiple consuming workloads, and yet with consistent security and governance across. Remember, this was only one simple example. This basic setup, however, offers endless opportunities to implement way more complex solutions. Feel free to try Data Discovery and Exploration in CDP on your own and play around with more advanced pipelines and let us know how it goes! Alternatively, contact us for more information. The post How-to: Index Data from S3 via NiFi Using CDP Data Hubs appeared first on Cloudera Blog.
Read more
  • 0
  • 0
  • 4714
article-image-we-got-tableau-certified-you-can-too-from-whats-new
Matthew Emerick
18 Sep 2020
5 min read
Save for later

We got Tableau certified, you can too! from What's New

Matthew Emerick
18 Sep 2020
5 min read
Keri Harling Senior Copywriter; Tableau Software Hannah Kuffner September 18, 2020 - 7:32pm September 18, 2020 Data skills are important now more than ever. Whether you just started at your university or are finishing up your final year, there’s always something new to learn. Students are eligible to receive free Tableau licenses, eLearning, and 20% off Tableau Desktop Specialist Certification through Tableau Academic Programs. To help set students up for success, we sat down with two amazing women on different data journeys to hear their advice on preparing for the Tableau Certification exam with a step-by-step guide. Spoiler: you’ll crush it. Bergen Schmetzer, Tableau Academic Programs: I’ve been at Tableau for four years, and have found my happy place on the Tableau Academic team. I was introduced to Tableau in my junior year of college, and it truly changed the way I look at data and analytics. The best advice I can give to students looking to start their analytics journey is just to take that first step. If you are feeling some sense of fear—that’s GOOD! You are beginning something new and unfamiliar, and that’s excitingly scary. Leveling up your skills, especially in data analytics, doesn’t happen overnight. What I love about Tableau is the focus on supporting and elevating the people in our Community. Our goal is to provide people with the resources and skills to empower themselves. Plus, we are passionate about celebrating the success of like-minded data rockstars. Kelly Nesenblatt, Student: I’m a senior at the University of Arizona and preparing to enter the workforce out of college. I saw a huge need for data skills at the companies I was interested in but didn’t know where to start. I knew Tableau’s Academic Programs offer Tableau Desktop, Prep, and eLearning for free to students, and I recently found out about the Desktop Specialist Certification discount. Not only was this an opportunity to add a certification to my resume, but it was also a great reason to strengthen my data skills. If I had one piece of advice to share—be confident in what you know. If you have prepared and are comfortable with the platform, the Specialist exam will greatly benefit you. Since passing the Specialist exam, my goal is to complete the Associate and Professional levels next. Steps to pass the Certification Exam: 1. Join Tableau for Students or Tableau for Teaching Program We’ve helped over one million students and instructors find empowerment in Tableau. Students and instructors can receive free licenses and eLearning through our Academic Programs. 2. Schedule your exam It may sound crazy but schedule your exam first. It's counter-intuitive, but setting a deadline for yourself will drive you to study. After you’ve been verified as a student, you will receive a 20% discount off the Tableau Certification Exam. The discount applies automatically during checkout. Your exam will be valid for six months after your purchase date, and you can reschedule anytime but no later than 24 hours before the exam start time. 3. Download Tableau Desktop, access free eLearning You’ve activated your Tableau license, scheduled your exam, and now it's time to study. eLearning is one of the best places to begin preparing for your exam. We recommend starting with Desktop I to get familiar with the terminology and basics of Tableau. Completing Desktop I takes around 10 hours, but since it’s self-paced, you can go at whatever speed is comfortable for you. We don’t mind waiting for greatness. 4. Practice makes perfect We have a TON of support materials outside of eLearning to help prepare you for the exam. See below for some of our favorite go-to study resources. Training videos—we have hundreds of videos ranging from quick tips to deep dives in Tableau. Tableau for Student Guide—Maria Brock, a Tableau Student Ambassador, put together an entire website dedicated to students looking to learn about Tableau. She spoils us! 5. Get inspired by the Tableau Community The Tableau Community is a group of brilliant Tableau cheerleaders. They love seeing people find the magic within Tableau, and are an amazing support system. You can get to know our Community in several ways: Read how other students use Tableau or hear from Tableau interns through our Generation Data blog series. Looking for specific answers? Our Student Ambassadors are Tableau Champions at their university and assist other students in their Tableau journey. Connect to the people and information you need most. Our global Community— Tableau Community Forums—actively answers your questions—from dashboard designs to tips and tricks, we’re here to help. Check out Tableau Public to see some incredible vizzes from members of the Community. 6. Day of Exam It’s normal to have day-of-exam jitters, but if you’ve leveraged some of the resources we’ve shared in this blog, you’ve got nothing to worry about. Double-check your systems are set up and ready for the exam, choose an environment with a reliable internet connection, and make sure you will be undisturbed throughout the exam period. The exam is timed, so it’s important to remember you can always flag questions if you get stuck and come back to them later. Don’t let one tricky question play mind games with you and make you lose confidence—you’ve got this. 7. You’ve leveled UP! Celebrate your certification. If you passed, it's time to show it off. Share your well-deserved badge on social media and use the hashtag #CertifiablyTableau. Certifications are an identifiable way to demonstrate your data know-how and willingness to invest in your future. Having this certification under your belt will make you stand out amongst your peers to future employers. If you didn't pass the exam the first time—don't get discouraged. It happens to the best of us. The second time's a charm'. Join our Tableau for Students program to get started today and receive 20% off the Tableau Desktop Specialist exam.
Read more
  • 0
  • 0
  • 4703

article-image-introducing-improved-online-offline-flows-in-tableau-mobile-from-whats-new
Anonymous
17 Nov 2020
4 min read
Save for later

Introducing improved online/offline flows in Tableau Mobile from What's New

Anonymous
17 Nov 2020
4 min read
Shweta Jindal Product Manager Jim Cox Staff Product Manager Tanna Solberg November 17, 2020 - 5:21pm November 17, 2020 Having access to your data while on the go is important for making decisions at the speed of business. But as a Mobile user, you may not always be able to connect to Tableau Server or Tableau Online—perhaps you’re on a plane or visiting a customer site where a network connection may be unavailable. Fortunately, Tableau Mobile provides offline access to all the interactive dashboards and views that you save as a Favorite. These downloaded vizzes are called Previews.    Today, we’re excited to announce that we’re introducing a change in the way that Tableau Mobile shows these previews to create a more seamless and intuitive experience—both when the device is connected and when it’s not.  When the device is connected to the server Current connected experience Up until now, when you launch a favorite view, the preview loads quickly with limited interactivity, and displays a ‘Go Live’ button.  Tapping ‘Go Live’ initiates a server request, and you’ll see a spinner while the view is calculated. You will be switched to the view once it is rendered. We thought this flow would work well for the majority of cases. The preview loads quickly and with sufficient interactivity—you can clearly see the data (pan, zoom, and scroll), tap any mark to see a tooltip, and see highlighted actions when a mark is tapped. Only when you attempt to change a filter value does the app warn that a server connection is required, and instructs you to tap ‘Go Live’. However, in the real world, we have observed that the majority of people tap ‘Go Live’ immediately. If the server is connected, they just want to see the latest version of the interactive dashboard. New and improved connected experience We want to reduce friction in your workflow so you can use the previews in a more helpful way. Now, when you tap on a view from Favorites, we show the preview immediately. We also make the server request for the latest view and load it in the background. There is no ‘Go Live’ button—instead, a banner message lets you know that the latest view is loading. If you haven’t interacted with the preview, the app transitions seamlessly to the latest view when it has loaded, and the banner disappears. With this flow, you see the preview immediately, while the latest view loads in the background—all without experiencing a spinner. Once the latest view is available and you haven’t interacted with the preview, you will be switched to it automatically.   If the latest view is taking a few seconds to load, you can opt to interact with the preview by scrolling or tapping. When this happens, we don’t automatically transition to the latest view—we don’t want to disrupt your flow. Instead, we added a button to the banner. The latest view, that has already loaded in the background, is surfaced when you tap ‘See Latest View’. When the device is not connected to the server Current offline experience Up until now, when you launch a view, the preview loads with limited interactivity. The Go Live button is presented as an option—even if the device is disconnected and can’t go live. When you tap the Go Live button, Tableau Mobile attempts to contact the server but fails and displays an error message. In this case, you benefit from having the preview available immediately, even when offline. But you may not know that the device is disconnected—so tapping ‘Go Live’ and getting an error message is not the greatest experience. New and improved offline experience Now, if you’re not connected, the banner reports that the app cannot load the latest view and offers you a button to see the reason. Tapping ‘See Error Details’ shows an error page explaining there is no server connection. In this case, you can continue to see and interact with the preview, but without a potentially confusing ‘Go Live’ button on the screen. Summary With these new flows, you’ll transition seamlessly to the latest view when connected—without having to ever see a spinner or tap a button. Plus, you can interact with the dashboard while the latest view is loading. And when you’re offline, the new flow shows the available preview without a confusing Go Live button. We hope these changes make using Tableau Mobile a faster and more pleasant experience for all. Download the latest version of the Tableau Mobile App to enjoy this new experience—available on both Apple Store and Google Play. If you have any questions or feedback, please reach out at sjindal@tableau.com.
Read more
  • 0
  • 0
  • 4634
Modal Close icon
Modal Close icon