UDF
User-defined functions (UDFs) can run batch processing on a chain to generate new chain values. The UDF will take fields from one or more rows of the data and output new fields. A UDF can run at scale on multiple workers and processes.
A UDF can be any Python function. The classes below are useful to implement a "stateful"
UDF where a function is insufficient, such as when additional setup() or teardown()
steps need to happen before or after the processing function runs.
    
              Bases: AbstractUDF
Base class for stateful user-defined functions.
Any class that inherits from it must have a process() method that takes input
params from one or more rows in the chain and produces the expected output.
Optionally, the class may include these methods:
- setup() to run code on each  worker before process() is called.
- teardown() to run code on each  worker after process() completes.
Example
import datachain as dc
import open_clip
class ImageEncoder(dc.Mapper):
    def __init__(self, model_name: str, pretrained: str):
        self.model_name = model_name
        self.pretrained = pretrained
    def setup(self):
        self.model, _, self.preprocess = (
            open_clip.create_model_and_transforms(
                self.model_name, self.pretrained
            )
        )
    def process(self, file) -> list[float]:
        img = file.get_value()
        img = self.preprocess(img).unsqueeze(0)
        emb = self.model.encode_image(img)
        return emb[0].tolist()
(
    dc.read_storage(
        "gs://datachain-demo/fashion-product-images/images", type="image"
    )
    .limit(5)
    .map(
        ImageEncoder("ViT-B-32", "laion2b_s34b_b79k"),
        params=["file"],
        output={"emb": list[float]},
    )
    .show()
)
Source code in datachain/lib/udf.py
                    
                  
property
  
    Returns the name of the function or class that implements the UDF.
    Processing function that needs to be defined by user
    Initialization process executed on each worker before processing begins. This is needed for tasks like pre-loading ML models prior to scoring.
    Teardown process executed on each process/worker after processing ends. This is needed for tasks like closing connections to end-points.