UDF to Catalyst Expressions

To speedup the processing of user defined functions (UDFs), the RAPIDS Accelerator for Apache Spark introduces a UDF compiler extension to translate UDFs to Catalyst expressions.

To enable this operation on the GPU, set spark.rapids.sql.udfCompiler.enabled to true.

Be aware Spark may produce different results for a compiled UDF vs. the non-compiled. For example: a UDF of x/y where y happens to be 0, the compiled catalyst expressions will return NULL while the original UDF would fail the entire job with a java.lang.ArithmeticException: / by zero

When translating UDFs to Catalyst expressions, the supported UDF functions are limited:

Operand type Operation
Arithmetic Unary +x
  -x
Arithmetic Binary lhs + rhs
  lhs - rhs
  lhs * rhs
  lhs / rhs
  lhs % rhs
Logical lhs && rhs
  lhs || rhs
  !x
Equality and Relational lhs == rhs
  lhs < rhs
  lhs <= rhs
  lhs > rhs
  lhs >= rhs
Bitwise lhs & rhs
  lhs | rhs
  lhs ^ rhs
  ~x
  lhs « rhs
  lhs » rhs
  lhs »> rhs
Conditional if
  case
Math abs(x)
  cos(x)
  acos(x)
  asin(x)
  tan(x)
  atan(x)
  tanh(x)
  cosh(x)
  ceil(x)
  floor(x)
  exp(x)
  log(x)
  log10(x)
  sqrt(x)
  x.isNaN
Type Cast *
String lhs + rhs
  lhs.equalsIgnoreCase(String rhs)
  x.toUpperCase()
  x.trim()
  x.substring(int begin)
  x.substring(int begin, int end)
  x.replace(char oldChar, char newChar)
  x.replace(CharSequence target, CharSequence replacement)
  x.startsWith(String prefix)
  lhs.equals(Object rhs)
  x.toLowerCase()
  x.length()
  x.endsWith(String suffix)
  lhs.concat(String rhs)
  x.isEmpty()
  String.valueOf(boolean b)
  String.valueOf(char c)
  String.valueOf(double d)
  String.valueOf(float f)
  String.valueOf(int i)
  String.valueOf(long l)
  x.contains(CharSequence s)
  x.indexOf(String str)
  x.indexOf(String str, int fromIndex)
  x.replaceAll(String regex, String replacement)
  x.split(String regex)
  x.split(String regex, int limit)
  x.getBytes()
  x.getBytes(String charsetName)
Date and Time LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getYear
  LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getMonthValue
  LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getDayOfMonth
  LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getHour
  LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getMinute
  LocalDateTime.parse(x, DateTimeFormatter.ofPattern(pattern)).getSecond
Empty array creation Array.empty[Boolean]
  Array.empty[Byte]
  Array.empty[Short]
  Array.empty[Int]
  Array.empty[Long]
  Array.empty[Float]
  Array.empty[Double]
  Array.empty[String]
Arraybuffer new ArrayBuffer()
  x.distinct
  x.toArray
  lhs += rhs
  lhs :+ rhs
Method call Only if the method being called 1. Consists of operations supported by the UDF compiler, and 2. is one of the folllowing: a final method, a method in a final class, or a method in a final object
Captured variables Only primitive type variables captured from a method
Throwing exception Only if the exception thrown is a SparkException. The exception is then convered to a RuntimeException at runtime

All other expressions, including but not limited to try and catch, are unsupported and UDFs with such expressions cannot be compiled.