Which type of Python UDFs let you define Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series?
Vectorized Python UDFs let you define Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series. You call vectorized Py-thon UDFs the same way you call other Python UDFs.
Advantages of using vectorized Python UDFs compared to the default row-by-row processing pat-tern include:
The potential for better performance if your Python code operates efficiently on batches of rows.
Less transformation logic required if you are calling into libraries that operate on Pandas Data-Frames or Pandas arrays.
When you use vectorized Python UDFs:
You do not need to change how you write queries using Python UDFs. All batching is handled by the UDF framework rather than your own code.
As with non-vectorized UDFs, there is no guarantee of which instances of your handler code will see which batches of input.
Currently there are no comments in this discussion, be the first to comment!