Having the largest portable social graph, we encounter challenges everyday such as the "Super Labeler" below, a problem fundamental in figuring out the mapping of social networks to individuals. It's extremely important that the solution to a problem, like the one below, be parallelizable and highly scalable since we deal with terabytes of data.
This is our prize problem. Think you have what it takes to top our solution? If so, send us your solution and resume at challenge@rapleaf.com. We'd love to speak to you.
You run a company which specializes in labeling "anything fast, cheap, and efficient". You normally specialize in boxes, consumer items, and the like, but you were recently approached by a client with the following labeling needs:
"I have an abstract graph with billions of edges that needs to be labeled ASAP. I need each connected component to be labeled with a unique identifier. I don't care what ids you use, but the output needs to be mappings from nodes to their assigned id."
You object that you don't have the computing resources to take on such a daring task, but the client tells you that you can use his cluster of computers which has a MapReduce implementation installed on it. His MapReduce implementation reads and writes into a distributed filesystem which you also have access to.
As input, the client will give you a list of edges between nodes. Nodes are identified with a unique number that has been randomly chosen. There are no single-node components. As output, you need to output a mapping between nodes and unique ids for the components. Devise an efficient MapReduce algorithm to accomplish this task.
Input: |
Output: |
Hint #1: Solving this problem will require more than one MapReduce job Hint #2:Check out the following non-performant solution: Sub-Optimal Solution |
Please send your solution (and your resume if you have one ready) to challenge@rapleaf.com.

Follow