ml-deployment

Notes from Coursera [[ml-ops]]

Generally speaking, two things to keep in mind for deploying ML models:

Start small, and ramp/scale up
Always be able to rollback to previous versions

What kind of resources we have for deployment should be factored in when choosing an algorithm.

Shadow mode deployment

Don't rely on proposed system for real decision making yet: the model shadows a human's decisions, as to gather data on how the model is behaving (i.e. where models go wrong). Might not have to be a human, can be a previous algorithm—we just need a baseline.

Canary deployment

Roll out model on a small fraction of traffic, monitor and ramp up gradually. The canary is meant to signify problems early on before significant consequences.

Blue green deployment

Old = blue, new = green; data in your pipeline is directed by a router, which can then send data to blue and green. When ready for prime time, router sends traffic to the green version. Advantage of this is being able to rollback easily.

Deployment example

From this repo, an example would be to use fastAPI and uvicorn to serve an object detection model. A client interacts with a model by sending requests to the server.

One neat thing with this stack is that you can serve multiple models via different endpoints, for example:

myawesomemodel.com/count-cars/
myawesomemodel.com/count-apples/
myawesomemodel.com/count-plants/

Practically, you define callback patterns like @app.get("/my-endpoint"). The example in that notebook uses a /predict/ endpoint, covered by a prediction function that takes a model and uploaded file as inputs. The uploaded file is opened as a byte stream, loaded as a NumPy array followed by a cv2 Image. The ML model then creates a new image with bounding boxes drawn on objects, writes it to disk and sends the image back to the user.

@app.post("/predict") 
def prediction(model: Model, file: UploadFile = File(...)):
    # 1. VALIDATE INPUT FILE
    filename = file.filename
    fileExtension = filename.split(".")[-1] in ("jpg", "jpeg", "png")
    if not fileExtension:
        raise HTTPException(status_code=415, detail="Unsupported file provided.")
    # 2. TRANSFORM RAW IMAGE INTO CV2 image
    # Read image as a stream of bytes
    image_stream = io.BytesIO(file.file.read())
    # Start the stream from the beginning (position zero)
    image_stream.seek(0)
    # Write the stream of bytes into a numpy array
    file_bytes = np.asarray(bytearray(image_stream.read()), dtype=np.uint8)
    # Decode the numpy array as an image
    image = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)
    # 3. RUN OBJECT DETECTION MODEL
    # Run object detection
    bbox, label, conf = cv.detect_common_objects(image, model=model)
    # Create image that includes bounding boxes and labels
    output_image = draw_bbox(image, bbox, label, conf)
    # Save it in a folder within the server
    cv2.imwrite(f'images_uploaded/{filename}', output_image)
    # 4. STREAM THE RESPONSE BACK TO THE CLIENT
    # Open the saved image for reading in binary mode
    file_image = open(f'images_uploaded/{filename}', mode="rb")
    # Return the image as a stream specifying media type
    return StreamingResponse(file_image, media_type="image/jpeg")

Backlinks

ml-monitoring

Start with redundant metrics, and gradually whittle down to necessary ones. #idea would be to also train a random forest to find important metrics. These provide feedback in the loop of [[ml-deployment]] to traffic to performance analysis. By analogy, ML goes from data to experiment to analysis.

ml-ops

The main ideas include [[scalability]] of models, both in terms of training and in terms of [[ml-deployment]]/inference, and making sure everything works well and is maintainable throughout the lifecycle of a project.