Reader small image

You're reading from  Extending Power BI with Python and R - Second Edition

Product typeBook
Published inMar 2024
Reading LevelIntermediate
PublisherPackt
ISBN-139781837639533
Edition2nd Edition
Languages
Right arrow
Author (1)
Luca Zavarella
Luca Zavarella
author image
Luca Zavarella

Luca Zavarella has a rich background as an Azure Data Scientist Associate and Microsoft MVP, with a Computer Engineering degree from the University of L'Aquila. His decade-plus experience spans the Microsoft Data Platform, starting as a T-SQL developer on SQL Server 2000 and 2005, then mastering the full suite of Microsoft Business Intelligence tools (SSIS, SSAS, SSRS), and advancing into data warehousing. Recently, his focus has shifted to advanced analytics, data science, and AI, contributing to the community as a speaker and blogger, especially on Medium. Currently, he leads the Data & AI division at iCubed, and he also holds an honors degree in classical piano from the "Alfredo Casella" Conservatory in L'Aquila.
Read more about Luca Zavarella

Right arrow

Importing large datasets with R

The same scalability limitations illustrated for Python packages used to manipulate data also exist for R packages in the Tidyverse ecosystem. Even in R, it is not possible to use a dataset larger than the available RAM on the machine. The first solution that is adopted in these cases is also to switch to Spark-based distributed systems that provide the SparkR language. It provides a distributed implementation of the DataFrame you are used to in R, supporting filtering, aggregation, and selection operations as you do with the dplyr package. For those of us who are fans of the Tidyverse world, RStudio is actively developing the sparklyr package, which allows you to use all the functionality of dplyr, even for distributed DataFrames. However, using Spark-based systems to process CSVs that together take up little more than the RAM you have available on your machine may be overkill due to the overhead of all the Java infrastructure needed to run them.In the...

lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
Extending Power BI with Python and R - Second Edition
Published in: Mar 2024Publisher: PacktISBN-13: 9781837639533

Author (1)

author image
Luca Zavarella

Luca Zavarella has a rich background as an Azure Data Scientist Associate and Microsoft MVP, with a Computer Engineering degree from the University of L'Aquila. His decade-plus experience spans the Microsoft Data Platform, starting as a T-SQL developer on SQL Server 2000 and 2005, then mastering the full suite of Microsoft Business Intelligence tools (SSIS, SSAS, SSRS), and advancing into data warehousing. Recently, his focus has shifted to advanced analytics, data science, and AI, contributing to the community as a speaker and blogger, especially on Medium. Currently, he leads the Data & AI division at iCubed, and he also holds an honors degree in classical piano from the "Alfredo Casella" Conservatory in L'Aquila.
Read more about Luca Zavarella