How to get images from Excel documents using API in Java

Unique images beautify Excel reports. When, for example, we receive a spreadsheet of product sales reports with relevant product images, we might walk away with a better understanding of the physical item behind the changing numbers.

When we build web applications to streamline Excel-related processes, automating workflows that extract and share relevant images between the multitude of reports that live in our file storage ecosystem can significantly increase the efficiency of future projects—in much the same way extracting and sharing real data sets can. This is especially true when we receive reports from external stakeholders that contain imagery that we normally do not have direct access to. If we can work out our own way to store relevant spreadsheet images in accessible locations for our business users, or even images directly into new, programmatically generated custom Excel files, we can transform another normally slow, manual content collaboration task into a fully automated, time-saving system .

Fortunately, the OpenXML file structure on which XLSX is based makes it easy to specify and extract image objects from specific locations within an Excel document. After all, OpenXML files are essentially standard zip archives full of neatly divided folders and files, which means programmatic navigation of the relative file path structure is at the heart of our work.

Background

To provide a little background, pictures (and other graphical elements) in Excel are rendered with instructions from the xl/drawings folder in the OpenXML XLSX file structure. The actual image file objects themselves (such as PNG or JPG files added to the document) can be traced back to the xl/media folder. When we open an Excel document with images, a markup language called DrawingML (Drawing Markup Language) dictates how and where we will see the image on a specific spreadsheet worksheet.

When programmatically extracting images from Excel documents, we simply unpack the xl/media folder and retrieve one or more objects. Because we access fully formed objects, we can also retrieve key image information from our request, including the actual file name, content type, embedded ID, and file path.

While we could certainly write code in different programming languages to unpack the Excel files and extract the image objects ourselves, we could – depending on the context of our own project – also benefit from a low-code API solution that would handle the operation for us.

In this article, I will demonstrate two easy-to-use API solutions that together simplify the process of extracting image objects from Excel files.

Demonstration

Using the ready-to-run Java code examples given below, we can call two APIs back-to-back that will return the file bytes for each image object stored within the XLSX file. We will only need a free API key to authorize our API calls.

These APIs perform the following actions:

The returned image objects are defined in the XLSX spreadsheet as a temporary URL.
Converting a temporary URL to file bytes that can be written to a new file

In our first API call, we can optionally specify where in the XLSX document we want to retrieve the images. We can specify these instructions by following the JSON request format example below:


  "InputFileBytes": "string",
  "InputFileUrl": "string",
  "WorksheetToQuery": 
    "Path": "string",
    "WorksheetName": "string"

For customization WorksheetToQuery object, of course we will need to have information available about specific file paths or worksheet names. If we do not specify the worksheet for which we want to query (i.e. if we leave WorksheetToQuery object empty), the operation will simply return all image objects present in the entire document.

Our API response will return one information object for each image, following the example JSON response object below:


  "Successful": true,
  "Images": [
    
      "Path": "string",
      "ImageDataEmbedId": "string",
      "ImageDataContentType": "string",
      "ImageInternalFileName": "string",
      "ImageContentsURL": "string"
    
  ]

To call our first API, we can start by installing the SDK. We can install with Maven by first adding a reference to the repository in pom.xml:

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

And then adding a reference to the dependency in pom.xml:

<dependencies>
<dependency>
    <groupId>com.github.Cloudmersive</groupId>
    <artifactId>Cloudmersive.APIClient.Java</artifactId>
    <version>v4.25</version>
</dependency>
</dependencies>

Once we remove that, we can copy the imports and the later function into our file. We can copy our API key into the indicated snippet, and then we can supply our custom request object (whether it contains the public URL of the Excel file or the bytes of the Excel file):

// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.EditDocumentApi;

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");

EditDocumentApi apiInstance = new EditDocumentApi();
GetXlsxImagesRequest input = new GetXlsxImagesRequest(); // GetXlsxImagesRequest | Document input request
try 
    GetXlsxImagesResponse result = apiInstance.editDocumentXlsxGetImages(input);
    System.out.println(result);
 catch (ApiException e) 
    System.err.println("Exception when calling EditDocumentApi#editDocumentXlsxGetImages");
    e.printStackTrace();

Once this returns our temporary URL, we can then call the following function to convert the URL to image file bytes:

// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.EditDocumentApi;

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");

EditDocumentApi apiInstance = new EditDocumentApi();
FinishEditingRequest reqConfig = new FinishEditingRequest(); // FinishEditingRequest | Cloudmersive Document URL to complete editing on
try 
    byte[] result = apiInstance.editDocumentFinishEditing(reqConfig);
    System.out.println(result);
 catch (ApiException e) 
    System.err.println("Exception when calling EditDocumentApi#editDocumentFinishEditing");
    e.printStackTrace();

We can now write our byte image files to new image files (like PNG or JPEG) in our file storage system, or we can write a post-workflow to copy the image contents into a new document.

Source link