Box Diff Tool
Mark Elvers
1 min read

Categories

  • OCaml,Box

Tags

  • tunbury.org

Box has an unlimited storage model but has an upload limit of 1TB per month. I have been uploading various data silos but would now like to verify that the data is all present. Box has an extensive API, but I only need the list items in folder call.

The list-items call assumes that you have a folder ID which you would like to query. The root of the tree is always ID 0. To check for the presence of file foo in a folder tree a/b/c/foo, we need to call the API with folder ID 0. This returns a list of entries in that folder. e.g.

{
  "entries": [
    {
      "id": "12345",
      "type": "folder",
      "name": "a"
    }
  ]
}

The API must now be called again with the new ID number to get the contents of folder a. This is repeated until we finally have the entries for folder c which would contain the file itself. I have used a Hashtbl to cache the results of each call.

{
  "entries": [
    {
      "id": "78923434",
      "type": "file",
      "name": "foo"
    }
  ]
}

Each call defaults to returning at most 100 entries. This can be increased to a maximum of 1000 by passing ?limit=1000 to the GET request. For more results, Box offers two pagination systems: offset and marker. Offset allows you to pass a starting item number along with the call, but this is limited to 10,000 entries.

Queries with offset parameter value exceeding 10000 will be rejected with a 400 response.

To deal with folders of any size, we should use the marker system. For this, we pass ?usemarker=true to the first GET request, which causes the API to return next_marker and prev_marker as required as additional JSON properties. Subsequent calls would use ?usemarker=true&marker=XXX. The end is detected by the absence of the next_marker when no more entries are available.

The project can be found on GitHub in mtelvers/ocaml-box-diff.