Box has an unlimited storage model but has an upload limit of 1TB per month. I have been uploading various data silos but would now like to verify that the data is all present. Box has an extensive API, but I only need the list items in folder call.
The list-items call assumes that you have a folder ID which you would like to query. The root of the tree is always ID 0. To check for the presence of file foo
in a folder tree a/b/c/foo
, we need to call the API with folder ID 0. This returns a list of entries in that folder. e.g.
{
"entries": [
{
"id": "12345",
"type": "folder",
"name": "a"
}
]
}
The API must now be called again with the new ID number to get the contents of folder a
. This is repeated until we finally have the entries for folder c
which would contain the file itself. I have used a Hashtbl
to cache the results of each call.
{
"entries": [
{
"id": "78923434",
"type": "file",
"name": "foo"
}
]
}
Each call defaults to returning at most 100 entries. This can be increased to a maximum of 1000 by passing ?limit=1000
to the GET request. For more results, Box offers two pagination systems: offset
and marker
. Offset allows you to pass a starting item number along with the call, but this is limited to 10,000 entries.
Queries with offset parameter value exceeding 10000 will be rejected with a 400 response.
To deal with folders of any size, we should use the marker system. For this, we pass ?usemarker=true
to the first GET request, which causes the API to return next_marker
and prev_marker
as required as additional JSON properties. Subsequent calls would use ?usemarker=true&marker=XXX
. The end is detected by the absence of the next_marker
when no more entries are available.
The project can be found on GitHub in mtelvers/ocaml-box-diff.