19

I would like to capture the result of show in pyspark, similar to here and here. I was not able to find a solution with pyspark, only scala.

df.show()
#+----+-------+
#| age|   name|
#+----+-------+
#|null|Michael|
#|  30|   Andy|
#|  19| Justin|
#+----+-------+

The ultimate purpose is to capture this as string inside my logger.info I tried logger.info(df.show()) which will only display on console.

0

1 Answer 1

29

You can build a helper function using the same approach as shown in post you linked Capturing the result of explain() in pyspark. Just examine the source code for show() and observe that it is calling self._jdf.showString().

The answer depends on which version of spark you are using, as the number of arguments to show() has changed over time.

Spark Version 2.3 and above

In version 2.3, the vertical argument was added.

def getShowString(df, n=20, truncate=True, vertical=False):
    if isinstance(truncate, bool) and truncate:
        return(df._jdf.showString(n, 20, vertical))
    else:
        return(df._jdf.showString(n, int(truncate), vertical))

Spark Versions 1.5 through 2.2

As of version 1.5, the truncate argument was added.

def getShowString(df, n=20, truncate=True):
    if isinstance(truncate, bool) and truncate:
        return(df._jdf.showString(n, 20))
    else:
        return(df._jdf.showString(n, int(truncate)))

Spark Versions 1.3 through 1.4

The show function was first introduced in version 1.3.

def getShowString(df, n=20):
    return(df._jdf.showString(n))

Now use the helper function as follows:

x = getShowString(df)  # default arguments
print(x)
#+----+-------+
#| age|   name|
#+----+-------+
#|null|Michael|
#|  30|   Andy|
#|  19| Justin|
#+----+-------+

Or in your case:

logger.info(getShowString(df))
5
  • 2
    Hmm I got an error showString does not exist. An error occurred while calling o10175.showString. Trace: py4j.Py4JException: Method showString([class java.lang.Integer]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274)
    – Kenny
    Commented Apr 12, 2019 at 15:04
  • 1
    @Kenny what version of spark (print(spark.version))? You have to use the version of the function that is specific to your spark version.
    – pault
    Commented Apr 12, 2019 at 15:05
  • 1
    ver 2.2 . Please disregard the error. I got mixed up between n and 20. There should be 2 params there. Great answer, thanks @pault
    – Kenny
    Commented Apr 12, 2019 at 15:08
  • 1
    Unbelievable they didn't provide such helper functions themself yet, not even with the 3.x version.
    – ciurlaro
    Commented Nov 27, 2020 at 17:15
  • It is working as expected. Thanks alot. Commented Aug 2, 2023 at 15:03

Not the answer you're looking for? Browse other questions tagged or ask your own question.