Core ML Background Removal in SwiftUI

Use the DeepLabV3 image segmentation model to add, remove, and modify image backgrounds in your iOS app

May 29, 2021

CoreML Image Segmentation. DeepLabV3 background remove, change, transparent — Original Photo by Brooke Cagle on Unsplash. Final results by the author.

Core ML is Apple’s mobile machine learning framework that lets you deploy, run, and re-train models on a device.

From text and sound to image recognition, the number of things you can achieve with Core ML is limitless. On top of it all, there’s Vision, Apple’s own computer vision framework that provides more than half a dozen built-in models. More importantly, it acts as a container for Core ML to make pre-processing and inference a whole lot easier.

In this tutorial, we’ll be implementing one of the most popular use cases of machine learning on mobile: image segmentation in our iOS application.

Image segmentation is a deep learning mechanism that lets us segregate different objects in an image. It’s a commonly used computer vision technique in self-driving cars and for drawing bounding boxes in certain parts of an image.

CoreML Semantic Segmentation Foreground and Background — Source: Apple Developer

In the next few sections, we’ll be using a DeepLabV3 model for segmenting the foreground and background parts of an image in our SwiftUI application. By doing so, you’ll be able to add, remove, and modify the backgrounds from your photos. After all, who wouldn’t want to flip their boring background sceneries with beautiful virtual backgrounds?

Without wasting any more time, let’s get started.

Obtaining the DeepLab Core ML Model

Previously, you’d have to convert it from other formats such as Pytorch and Tensorflow, but now Apple provides us with a downloadable Core ML file that can be used in Xcode directly. You can obtain the DeepLabV3 CoreML model from Apple’s Machine Learning page.

Launch a new Xcode project with SwiftUI as our user interface and drag-and-drop the Core ML file above. You should see the Core ML model description as shown below:

DeepLabV3 CoreML description — Xcode Core ML viewer. Screenshot by the author.

The input type is an Image with 513 x 513 dimensions, while the output is a MLMultiArray of the same size. We’ll soon see how to convert the output type into our desired image format.

But first, let’s set up our SwiftUI view.

Setting Up Our SwiftUI View

The following code displays two images on the screen that are spaced equally. The one on the left is the input image and the one on the right will eventually show the segmentation results.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	struct ContentView: View {

	@State var outputImage : UIImage = UIImage(named: "unsplash")!
	@State var inputImage : UIImage = UIImage(named: "unsplash")!

	var body: some View {
	ScrollView{

	VStack{

	HStack{
	Image(uiImage: inputImage)
	.resizable()
	.aspectRatio(contentMode: .fit)

	Spacer()
	Image(uiImage: outputImage)
	.resizable()
	.aspectRatio(contentMode: .fit)
	}

	Spacer()

	Button(action: {runVisionRequest()}, label: {
	Text("Run Image Segmentation")
	})
	.padding()

	}
	}
	}
	//.... more here
	}

view raw SwiftUI-CoreML-BackgroundRemoval-UI.swift hosted with ❤ by GitHub

Note: The runVisionRequest function invoked in the SwiftUI Button action is where we’ll implement the Core ML image segmentation.

Here’s a screengrab of the current SwiftUI view:

SwiftUI CoreML Vision Segmentation Model — iOS simulator. Screenshot by the author.

Running Image Segmentation Using Vision Request

Next, let’s set up our vision request to run the DeepLabV3 image segmentation model:

Show hidden characters

	func runVisionRequest() {

	guard let model = try? VNCoreMLModel(for: DeepLabV3(configuration: .init()).model)
	else { return }

	let request = VNCoreMLRequest(model: model, completionHandler: visionRequestDidComplete)
	request.imageCropAndScaleOption = .scaleFill
	DispatchQueue.global().async {

	let handler = VNImageRequestHandler(cgImage: inputImage.cgImage!, options: [:])

	do {
	try handler.perform([request])
	}catch {
	print(error)
	}
	}
	}

view raw runVisionRequest-image-segmentation.swift hosted with ❤ by GitHub

There are a few things that we can draw on from the code above:

Core ML deprecated the default init method (DeepLabV3()) in iOS 14, so we’ve used the new init(configuration:).
VNCoreMLModel is a container for Core ML models. We need this format to perform Vision processing using the VNCoreMLRequest.
Once the VNCoreMLRequest is completed, it triggers the completion handler function. In our case, we’ve defined it in a visionRequestDidComplete function.
The VNImageRequestHandler function is where our Vision request is triggered. We pass the input image (which Vision pre-processes to match the model input size) here and set the VNCoreMLRequest in the handler.perform function.

Retrieving the Segmentation Mask From the Output

Once the VNImageRequest is completed, we can handle the results in the visionRequestDidComplete completion handler that’s defined below:

Show hidden characters

	func visionRequestDidComplete(request: VNRequest, error: Error?) {
	DispatchQueue.main.async {
	if let observations = request.results as? [VNCoreMLFeatureValueObservation],
	let segmentationmap = observations.first?.featureValue.multiArrayValue {

	let segmentationMask = segmentationmap.image(min: 0, max: 1)

	self.outputImage = segmentationMask!.resizedImage(for: self.inputImage.size)!

	maskInputImage()
	}
	}
	}

view raw visionRequestDidComplete-image-segmentation.swift hosted with ❤ by GitHub

There are a few inferences to draw from the code above:

The output returned from the Vision image analysis is a dictionary: VNCoreMLFeatureValueObservation.
The MLMultiArray that contains our segmentation map resides in the first key of the dictionary.
We need to convert the 2D array segmentation map into a UIImage. In order to do this, I’ve used Matthijs Hollemans’s CoreMLHelper tools that reduce the boilerplate code we write. You can find that code at the end of this tutorial.
The segmentationmap.image(min: 0, max: 1) helper function converts the MLMultiArray to UIImage, which we can then resize to match our initial image’s size.
The resizedImage is a UIImage Swift extension that I wrote. It’s available in this gist.
The maskInputImage() function is where we’ll mask our initial image with the segmentation results to produce new backgrounds.

Now that our segmentation mask is ready, let’s take a look at it:

CoreML Segmentation Mask black and white — iOS simulator. Screenshot by the author.

Great! Our segmentation mask separates the foreground image from the background by using different colors for each set of pixels.

Now, it’s time to brush up on our CoreImage skills to blend the mask on the image.

Modify Backgrounds by Using the Segmentation Mask

Core Image is Apple’s image processing library. It provides a vast variety of image filters to choose from.

In our case, we need to blend the segmentation mask on the original image such that the background is hidden. Also, we’d like to add a new background.

Core Image’s CIBlendWithMask filter would be perfect for our case. Here’s a look at how it operates:

CIBlendWithMask filter Apple documentation — Source: Apple Developer. Text added by the author.

Let’s take a look at the maskInputImage function where our CIBlendWithMask filter runs:

Show hidden characters

	func maskInputImage(){

	let bgImage = UIImage.imageFromColor(color: .blue, size: self.inputImage.size, scale: self.inputImage.scale)!

	let beginImage = CIImage(cgImage: inputImage.cgImage!)
	let background = CIImage(cgImage: bgImage.cgImage!)
	let mask = CIImage(cgImage: self.outputImage.cgImage!)

	if let compositeImage = CIFilter(name: "CIBlendWithMask", parameters: [
	kCIInputImageKey: beginImage,
	kCIInputBackgroundImageKey:background,
	kCIInputMaskImageKey:mask])?.outputImage
	{

	let ciContext = CIContext(options: nil)
	let filteredImageRef = ciContext.createCGImage(compositeImage, from: compositeImage.extent)

	self.outputImage = UIImage(cgImage: filteredImageRef!)
	}
	}

view raw ciblendwithmask.swift hosted with ❤ by GitHub

Here are a few key observations about this code:

imageFromColor is a Swift extension to convert solid colors into UIImage. We pass the color, input image size, and scale. Setting the same scale is very important to ensure the CGImage size matches with our original image. Otherwise, the Core Image CIBlendWithMask filter gives a distorted result.
The CIBlendWithMask filter requires three parameter keys: kCIInputImageKey, kCIInputBackgroundImageKey, and kCIInputMaskImageKey.
The outputImage returned from the filter is a CIImage. To convert it into UIImage, we first convert it into a CGImage by using createCGImage.

The final output of an image with a changed background is shown below:

CoreML Vision Virtual background output — iOS simulator. Screenshot by the author.

Blending Gradient Backgrounds in the Image

Besides a solid color, you can add any image as the background for your subject. Let’s set gradient colors.

I’ve used this Stack Overflow answer to implement the UIImage extension for gradients. Here’s the result:

CoreML Segmentation Gradient background output — iOS simulator. Screenshot by the author.

More Fun

We saw how to use Core ML and Vision to remove and modify backgrounds in images with a SwiftUI implementation. You can do a lot more — like blurring backgrounds or hiding only a part of the background image.

Just for fun, I replaced the background with the Eiffel Tower:

Photo by Lucas Albuquerque on Unsplash. Altered with the Core ML model.

let bgImage = UIImage(named: "tower")!.resized(to: self.inputImage.size, scale: self.inputImage.scale)

I cannot stress how important it is to use the same scale for the background image to ensure it fits the view.

The full source code of this project is available in this GitHub repository.

That’s it for this one. Try using an image with a group of people and see how it fares. Here’s one example:

iOSDevie

Discussion about this post

Ready for more?