JobPlus知识库 IT 大数据 文章
mapreduce单表关联----求爷孙关系

一、问题描述

下面给出一个child-parent的表格,要求挖掘其中的父子辈关系,给出祖孙辈关系的表格。

输入文件内容如下:

  1. child    parent

  2. Steven   Lucy

  3. Steven   Jack

  4. Jone     Lucy

  5. Jone     Jack

  6. Lucy     Mary

  7. Lucy     Frank

  8. Jack     Alice

  9. Jack     Jesse

  10. David    Alice

  11. David    Jesse

  12. Philip   David

  13. Philip   Alma

  14. Mark     David

  15. Mark     Alma

根据父辈和子辈挖掘爷孙关系。比如:

  1. Steven   Jack

  2. Jack     Alice

  3. Jack     Jesse

根据这三条记录,可以得出Jack是Steven的长辈,而Alice和Jesse是Jack的长辈,很显然Steven是Alice和Jesse的孙子。挖掘出的结果如下:

  1. grandson    grandparent

  2. Steven      Jesse

  3. Steven      Alice

要求通过MapReduce挖掘出所有的爷孙关系。

二、分析

解决这个问题要用到一个小技巧,就是单表关联。具体实现步骤如下,Map阶段每一行的key-value输入,同时也把value-key输入。以其中的两行为例:

  1. Steven   Jack

  2. Jack     Alice

key-value和value-key都输入,变成4行:

  1. Steven   Jack

  2. Jack     Alice

  3. Jack     Steven

  4. Alice    Jack

shuffle以后,Jack作为key值,起到承上启下的桥梁作用,Jack对应的values包含Alice、Steven,这时候Alice和Steven肯定是爷孙关系。为了标记哪些是孙子辈,哪些是爷爷辈,可以在Map阶段加上前缀,比如小辈加上前缀”-“,长辈加上前缀”+”。加上前缀以后,在Reduce阶段就可以根据前缀进行分类。

三、MapReduce程序

  1. package com.javacore.hadoop;

  2. import org.apache.hadoop.conf.Configuration;

  3. import org.apache.hadoop.fs.Path;

  4. import org.apache.hadoop.io.Text;

  5. import org.apache.hadoop.mapreduce.Job;

  6. import org.apache.hadoop.mapreduce.Mapper;

  7. import org.apache.hadoop.mapreduce.Reducer;

  8. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

  9. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

  10. import java.io.IOException;

  11. import java.util.ArrayList;

  12. import java.util.StringTokenizer;

  13. /**

  14. * Created by bee on 3/29/17.

  15. */

  16. public class RelationShip {

  17. public static class RsMapper extends Mapper<Object, Text, Text, Text> {

  18. private static int linenum = 0;

  19. public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

  20. String line = value.toString();

  21. if (linenum == 0) {

  22. ++linenum;

  23. } else {

  24. StringTokenizer tokenizer = new StringTokenizer(line, "\n");

  25. while (tokenizer.hasMoreElements()) {

  26. StringTokenizer lineTokenizer = new StringTokenizer(tokenizer.nextToken());

  27. String son = lineTokenizer.nextToken();

  28. String parent = lineTokenizer.nextToken();

  29. context.write(new Text(parent), new Text(

  30. "-" + son));

  31. context.write(new Text(son), new Text

  32. ("+" + parent));

  33. }

  34. }

  35. }

  36. }

  37. public static class RsReducer extends Reducer<Text, Text, Text, Text> {

  38. private static int linenum = 0;

  39. public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

  40. if (linenum == 0) {

  41. context.write(new Text("grandson"), new Text("grandparent"));

  42. ++linenum;

  43. }

  44. ArrayList<Text> grandChild = new ArrayList<Text>();

  45. ArrayList<Text> grandParent = new ArrayList<Text>();

  46. for (Text val : values) {

  47. String s = val.toString();

  48. if (s.startsWith("-")) {

  49. grandChild.add(new Text(s.substring(1)));

  50. } else {

  51. grandParent.add(new Text(s.substring(1)));

  52. }

  53. }

  54. for (Text text1 : grandChild) {

  55. for (Text text2 : grandParent) {

  56. context.write(text1, text2);

  57. }

  58. }

  59. }

  60. }

  61. public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

  62. FileUtil.deleteDir("output");

  63. Configuration cong = new Configuration();

  64. String[] otherArgs = new String[]{"input/relations/table.txt",

  65. "output"};

  66. if (otherArgs.length != 2) {

  67. System.out.println("参数错误");

  68. System.exit(2);

  69. }

  70. Job job = Job.getInstance();

  71. job.setJarByClass(RelationShip.class);

  72. job.setMapperClass(RsMapper.class);

  73. job.setReducerClass(RsReducer.class);

  74. job.setOutputKeyClass(Text.class);

  75. job.setOutputValueClass(Text.class);

  76. FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

  77. FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

  78. System.exit(job.waitForCompletion(true) ? 0 : 1);

  79. }

  80. }

四、输出结果

  1. grandson    grandparent

  2. Mark    Jesse

  3. Mark    Alice

  4. Philip  Jesse

  5. Philip  Alice

  6. Jone    Jesse

  7. Jone    Alice

  8. Steven  Jesse

  9. Steven  Alice

  10. Steven  Frank

  11. Steven  Mary

  12. Jone    Frank

  13. Jone    Mary


如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!

¥ 打赏支持
425人赞 举报
分享到
用户评价(0)

暂无评价,你也可以发布评价哦:)

扫码APP

扫描使用APP

扫码使用

扫描使用小程序