Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Apache Spark Graph Processing

You're reading from  Apache Spark Graph Processing

Product type Book
Published in Sep 2015
Publisher
ISBN-13 9781784391805
Pages 148 pages
Edition 1st Edition
Languages

Performance optimization


In addition to the sendMsg and mergeMsg methods, aggregateMessages can also take an optional argument TripletFields, which indicates what data is accessed in EdgeContext. The main reason for explicitly specifying such information is to help optimize the performance of the aggregateMessages operation.

In fact, TripletFields represents a subset of the fields of _EdgeTriplet_ and it enables GraphX to populate only those fields that are necessary.

The default value is TripletFields.All, which means that the sendMsg function may access any of the fields in the EdgeContext class. Otherwise, the TripletFields argument is used to tell GraphX that only part of EdgeContext will be required so that an efficient join strategy can be used. All possible options for the TripletFields are listed as follows:

  • TripletFields.All: This option exposes all the fields (source, edge, and destination)

  • TripletFields.Dst: This one exposes the destination and edge fields but not the source field...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}