sort and rank in spark RDD in one file

-1

I have an spark RDD as below

(maths,60)

(english,65)

(english,77)

(maths,23)

(maths,50)

I need to sort and rank the given RDD in one as below
(maths,23,1)
(maths,50,2)
(maths,50,3)
(english,65,1)
(english,77,2)

i know this can be done easily using Data Frame, but i need Spark rdd code to get the solution, please suggest

edited Nov 20 '18 at 6:47

mrsrinivas

15.7k77193

asked Nov 20 '18 at 6:21

devD

add a comment |

-1

I have an spark RDD as below

(maths,60)

(english,65)

(english,77)

(maths,23)

(maths,50)

I need to sort and rank the given RDD in one as below
(maths,23,1)
(maths,50,2)
(maths,50,3)
(english,65,1)
(english,77,2)

i know this can be done easily using Data Frame, but i need Spark rdd code to get the solution, please suggest

edited Nov 20 '18 at 6:47

mrsrinivas

15.7k77193

asked Nov 20 '18 at 6:21

devD

add a comment |

-1

I have an spark RDD as below

(maths,60)

(english,65)

(english,77)

(maths,23)

(maths,50)

I need to sort and rank the given RDD in one as below
(maths,23,1)
(maths,50,2)
(maths,50,3)
(english,65,1)
(english,77,2)

i know this can be done easily using Data Frame, but i need Spark rdd code to get the solution, please suggest

edited Nov 20 '18 at 6:47

mrsrinivas

15.7k77193

asked Nov 20 '18 at 6:21

devD

I have an spark RDD as below

(maths,60)

(english,65)

(english,77)

(maths,23)

(maths,50)

I need to sort and rank the given RDD in one as below
(maths,23,1)
(maths,50,2)
(maths,50,3)
(english,65,1)
(english,77,2)

i know this can be done easily using Data Frame, but i need Spark rdd code to get the solution, please suggest

scala apache-spark rdd

edited Nov 20 '18 at 6:47

mrsrinivas

15.7k77193

asked Nov 20 '18 at 6:21

devD

edited Nov 20 '18 at 6:47

mrsrinivas

15.7k77193

asked Nov 20 '18 at 6:21

devD

edited Nov 20 '18 at 6:47

mrsrinivas

15.7k77193

edited Nov 20 '18 at 6:47

mrsrinivas

15.7k77193

edited Nov 20 '18 at 6:47

mrsrinivas

15.7k77193

asked Nov 20 '18 at 6:21

devD

asked Nov 20 '18 at 6:21

devD

asked Nov 20 '18 at 6:21

devD

add a comment |

2 Answers
2

active

oldest

votes

Spark RDD functions(so called transformations) like groupByKey flatMap and Scala List function like sorted should helps in achieving it.

val rdd = spark.sparkContext.parallelize(

  Seq(("maths",60), 

      ("english",65), 

      ("english",77), 

      ("maths",23), 

      ("maths",50)))



val result = rdd.groupByKey().flatMap(group => {



  group._2.toList

  .sorted.toList // sort marks

  .zipWithIndex // add the position/rank

  .map {



    case(marks, index) => (group._1, marks, index + 1)

  }

})



result.collect



// Array((english,65,1), (english,77,2), (maths,23,1), (maths,50,2), (maths,60,3))

Databricks notebook

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 6:46

mrsrinivas

15.7k77193

wow, thanks a ton mr srinivas.....

– devD
Nov 20 '18 at 15:48

@devD: Glad that helps!! consider marking the answer as accepted so that community can know the question has been answered.

– mrsrinivas
Nov 21 '18 at 4:18

add a comment |

Another rdd solution:

val df = Seq(("maths",60),("english",65),("english",77),("maths",23),("maths",50)).toDF("subject","marks")

val rdd1 = df.rdd

rdd1.groupBy( x=> x(0))

  .map( x=> 

      {

         val p = x._2.toList.map(a=>a(1)).map(_.toString.toInt).sortWith((a1,a2)=> a1 < a2 ).zipWithIndex.map(b=>(b._1,b._2+1))

        (x._1,p) 

      }

  )

  .flatMap( x => x._2.map((x._1,_)))

  .collect.foreach(println)

Results:

(english,(65,1))

(english,(77,2))

(maths,(23,1))

(maths,(50,2))

(maths,(60,3))

answered Nov 20 '18 at 12:19

stack0114106

3,4162418

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387334%2fsort-and-rank-in-spark-rdd-in-one-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Spark RDD functions(so called transformations) like groupByKey flatMap and Scala List function like sorted should helps in achieving it.

val rdd = spark.sparkContext.parallelize(

  Seq(("maths",60), 

      ("english",65), 

      ("english",77), 

      ("maths",23), 

      ("maths",50)))



val result = rdd.groupByKey().flatMap(group => {



  group._2.toList

  .sorted.toList // sort marks

  .zipWithIndex // add the position/rank

  .map {



    case(marks, index) => (group._1, marks, index + 1)

  }

})



result.collect



// Array((english,65,1), (english,77,2), (maths,23,1), (maths,50,2), (maths,60,3))

Databricks notebook

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 6:46

mrsrinivas

15.7k77193

wow, thanks a ton mr srinivas.....

– devD
Nov 20 '18 at 15:48

@devD: Glad that helps!! consider marking the answer as accepted so that community can know the question has been answered.

– mrsrinivas
Nov 21 '18 at 4:18

add a comment |

Spark RDD functions(so called transformations) like groupByKey flatMap and Scala List function like sorted should helps in achieving it.

val rdd = spark.sparkContext.parallelize(

  Seq(("maths",60), 

      ("english",65), 

      ("english",77), 

      ("maths",23), 

      ("maths",50)))



val result = rdd.groupByKey().flatMap(group => {



  group._2.toList

  .sorted.toList // sort marks

  .zipWithIndex // add the position/rank

  .map {



    case(marks, index) => (group._1, marks, index + 1)

  }

})



result.collect



// Array((english,65,1), (english,77,2), (maths,23,1), (maths,50,2), (maths,60,3))

Databricks notebook

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 6:46

mrsrinivas

15.7k77193

wow, thanks a ton mr srinivas.....

– devD
Nov 20 '18 at 15:48

@devD: Glad that helps!! consider marking the answer as accepted so that community can know the question has been answered.

– mrsrinivas
Nov 21 '18 at 4:18

add a comment |

Spark RDD functions(so called transformations) like groupByKey flatMap and Scala List function like sorted should helps in achieving it.

val rdd = spark.sparkContext.parallelize(

  Seq(("maths",60), 

      ("english",65), 

      ("english",77), 

      ("maths",23), 

      ("maths",50)))



val result = rdd.groupByKey().flatMap(group => {



  group._2.toList

  .sorted.toList // sort marks

  .zipWithIndex // add the position/rank

  .map {



    case(marks, index) => (group._1, marks, index + 1)

  }

})



result.collect



// Array((english,65,1), (english,77,2), (maths,23,1), (maths,50,2), (maths,60,3))

Databricks notebook

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 6:46

mrsrinivas

15.7k77193

Spark RDD functions(so called transformations) like groupByKey flatMap and Scala List function like sorted should helps in achieving it.

val rdd = spark.sparkContext.parallelize(

  Seq(("maths",60), 

      ("english",65), 

      ("english",77), 

      ("maths",23), 

      ("maths",50)))



val result = rdd.groupByKey().flatMap(group => {



  group._2.toList

  .sorted.toList // sort marks

  .zipWithIndex // add the position/rank

  .map {



    case(marks, index) => (group._1, marks, index + 1)

  }

})



result.collect



// Array((english,65,1), (english,77,2), (maths,23,1), (maths,50,2), (maths,60,3))

Databricks notebook

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 6:46

mrsrinivas

15.7k77193

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 6:46

mrsrinivas

15.7k77193

answered Nov 20 '18 at 6:46

mrsrinivas

15.7k77193

answered Nov 20 '18 at 6:46

mrsrinivas

15.7k77193

wow, thanks a ton mr srinivas.....

– devD
Nov 20 '18 at 15:48

@devD: Glad that helps!! consider marking the answer as accepted so that community can know the question has been answered.

– mrsrinivas
Nov 21 '18 at 4:18

add a comment |

wow, thanks a ton mr srinivas.....

– devD
Nov 20 '18 at 15:48

@devD: Glad that helps!! consider marking the answer as accepted so that community can know the question has been answered.

– mrsrinivas
Nov 21 '18 at 4:18

wow, thanks a ton mr srinivas.....

– devD
Nov 20 '18 at 15:48

@devD: Glad that helps!! consider marking the answer as accepted so that community can know the question has been answered.

– mrsrinivas
Nov 21 '18 at 4:18

add a comment |

Another rdd solution:

val df = Seq(("maths",60),("english",65),("english",77),("maths",23),("maths",50)).toDF("subject","marks")

val rdd1 = df.rdd

rdd1.groupBy( x=> x(0))

  .map( x=> 

      {

         val p = x._2.toList.map(a=>a(1)).map(_.toString.toInt).sortWith((a1,a2)=> a1 < a2 ).zipWithIndex.map(b=>(b._1,b._2+1))

        (x._1,p) 

      }

  )

  .flatMap( x => x._2.map((x._1,_)))

  .collect.foreach(println)

Results:

(english,(65,1))

(english,(77,2))

(maths,(23,1))

(maths,(50,2))

(maths,(60,3))

answered Nov 20 '18 at 12:19

stack0114106

3,4162418

add a comment |

Another rdd solution:

val df = Seq(("maths",60),("english",65),("english",77),("maths",23),("maths",50)).toDF("subject","marks")

val rdd1 = df.rdd

rdd1.groupBy( x=> x(0))

  .map( x=> 

      {

         val p = x._2.toList.map(a=>a(1)).map(_.toString.toInt).sortWith((a1,a2)=> a1 < a2 ).zipWithIndex.map(b=>(b._1,b._2+1))

        (x._1,p) 

      }

  )

  .flatMap( x => x._2.map((x._1,_)))

  .collect.foreach(println)

Results:

(english,(65,1))

(english,(77,2))

(maths,(23,1))

(maths,(50,2))

(maths,(60,3))

answered Nov 20 '18 at 12:19

stack0114106

3,4162418

add a comment |

Another rdd solution:

val df = Seq(("maths",60),("english",65),("english",77),("maths",23),("maths",50)).toDF("subject","marks")

val rdd1 = df.rdd

rdd1.groupBy( x=> x(0))

  .map( x=> 

      {

         val p = x._2.toList.map(a=>a(1)).map(_.toString.toInt).sortWith((a1,a2)=> a1 < a2 ).zipWithIndex.map(b=>(b._1,b._2+1))

        (x._1,p) 

      }

  )

  .flatMap( x => x._2.map((x._1,_)))

  .collect.foreach(println)

Results:

(english,(65,1))

(english,(77,2))

(maths,(23,1))

(maths,(50,2))

(maths,(60,3))

answered Nov 20 '18 at 12:19

stack0114106

3,4162418

Another rdd solution:

val df = Seq(("maths",60),("english",65),("english",77),("maths",23),("maths",50)).toDF("subject","marks")

val rdd1 = df.rdd

rdd1.groupBy( x=> x(0))

  .map( x=> 

      {

         val p = x._2.toList.map(a=>a(1)).map(_.toString.toInt).sortWith((a1,a2)=> a1 < a2 ).zipWithIndex.map(b=>(b._1,b._2+1))

        (x._1,p) 

      }

  )

  .flatMap( x => x._2.map((x._1,_)))

  .collect.foreach(println)

Results:

(english,(65,1))

(english,(77,2))

(maths,(23,1))

(maths,(50,2))

(maths,(60,3))

answered Nov 20 '18 at 12:19

stack0114106

3,4162418

answered Nov 20 '18 at 12:19

stack0114106

3,4162418

answered Nov 20 '18 at 12:19

stack0114106

3,4162418

answered Nov 20 '18 at 12:19

stack0114106

3,4162418

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

INbTyGE

搜尋此網誌

Cfrgtkky