1

I am trying to access the streaming tweets from Spark Streaming.

This is the software configuration.

Ubuntu 14.04.2 LTS

scala -version

Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL

spark-submit --version

Spark version 1.6.0

Following is the code.

object PrintTweets
{
 def main(args: Array[String]) {

  // Configure Twitter credentials using twitter.txt
  setupTwitter()

  // Set up a Spark streaming context named "PrintTweets" that runs locally using
  // all CPU cores and one-second batches of data
  val ssc = new StreamingContext("local[*]", "PrintTweets", Seconds(1))

  // Get rid of log spam (should be called after the context is set up)
  setupLogging()

  // Create a DStream from Twitter using our streaming context
  val tweets = TwitterUtils.createStream(ssc, None)


  // Now extract the text of each status update into RDD's using map()
  val statuses = tweets.map(status => status.getText())

  // Print out the first ten
  statuses.print()

  // Kick it all off
  ssc.start()
  ssc.awaitTermination()
 }
}

Utilities.scala

object Utilities {
/** Makes sure only ERROR messages get logged to avoid log spam. */
def setupLogging() = {
  import org.apache.log4j.{Level, Logger}   
  val rootLogger = Logger.getRootLogger()
  rootLogger.setLevel(Level.ERROR)   
}

/** Configures Twitter service credentials using twiter.txt in the   main workspace directory */
def setupTwitter() = {
  import scala.io.Source

  for (line <- Source.fromFile("./data/twitter.txt").getLines) {
   val fields = line.split(" ")
   if (fields.length == 2) {
    System.setProperty("twitter4j.oauth." + fields(0), fields(1))
   }
 }

}

}

Issues:

Since it needs the twitter4j library, i have added
twitter4j-core-4.0.4, twitter4j-stream-4.0.4 in eclipse build path as external jars.

Then i ran the program, it didnt throw any error. But the tweets not appearing in console. It were empty.

So i see some forums and downgraded twitter4j to 3.0.3. Also in Eclipse i chosen Scala 2.10 Library container in Build Path window.

After that i got java.lang.NoSuchMethodError run-time error.

16/05/14 11:46:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError:    twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V
at    org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:72)
at   org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Please help me to resolve this. Initially i have installed spark by built using Scala 2.11. Is that the problem. Do i need uninstall everything and re-install Scala 2.10, then Spark pre-compiled package.

Or apart from Scala 2.11, do i need to have Scala 2.10 in my system?

3
  • You have answered your own question. Since everything your getting is in scala 2.11, why you cose Scala 2.10 n Eclipse? Could you change it to Scala 2.11, and try again? Commented May 15, 2016 at 9:23
  • Hey thanks everyone. I am able to solve it after looking into Spark UI by seeing the process graph. I am running the program inside VM and the problem was due to local[*] setting. After i changed it to local[2], i am able to see the tweets. I think local[*] is not able to acquire enough threads for processing the input streaming tweets. Commented May 16, 2016 at 23:01
  • Frankly, I do not see how this had resolved the problem. IMHO, the problem has nothing to do with the amount of resources assigned to the job. But good for you anyway :) Commented May 17, 2016 at 7:44

2 Answers 2

3

The above exception seems to be caused by the incompatibility of spark version 1.6.0 and twitter4j 3.0.3 version.

twitter4j.TwitterStream which is being passed in the onStart method of org.apache.spark.streaming.twitter.TwitterReceiver, has method addListener which takes instance of twitter4j.StreamListener.

twitter4j 3.0.3 version has no method twitter4j.TwitterStream.addListener(StreamListener), instead it has few other addListener methods, which take the subclass of StreamListener.

twitter4j 4.0.4 version has the desired method, so that's why no error comes with this library. So changing to twitter4j 3.0.3 version will not solve the problem.

Problem is somewhere else.

0

In my case. I had spark java project. I cleaned pom file and start adding in order. First resolved spark related errors, then spark launcher, next on ward based on bigger library. Note I was using cdh6.2.0 environment

Not the answer you're looking for? Browse other questions tagged or ask your own question.