Sources provide access to source images, translating request URI identifiers into source image locators, such as pathnames, in a particular type of underlying storage. After verifying that an image exists and is accessible, and making a best guess as to its format, a source can provide access to it to other application components in a generalized way.
In a simple configuration, one source supplies all requests. But it's also possible to select a source dynamically depending on the image identifier.
When the source.static
configuration key is set to the name of a source, that source will supply all requests.
When a static source is not flexible enough, it is also possible to serve images from different sources. For example, you may have some images stored on a filesystem, and others stored in an S3 bucket. If you can differentiate their sources based on their identifier in code—either by analyzing the identifier string, or performing some kind of service request—you can implement a delegate method to tell the image server from which source it should obtain the image.
To enable dynamic source selection, set the source.delegate
configuration key to true
, and implement the source()
delegate method. For example:
class CustomDelegate
def source(options = {})
identifier = context['identifier']
# Here, you would perform some kind of analysis on `identifier`:
# parse it, look it up in a web service or database...
# and then return the name of the source to supply it.
"FilesystemSource"
end
end
I want to serve images located…
On a filesystem… | …and the identifiers I use in URLs will correspond predictably to filesystem paths | FilesystemSource with BasicLookupStrategy |
…and filesystem paths will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier | FilesystemSource with ScriptLookupStrategy | |
On a web server… | …and the identifiers I use in URLs will correspond predictably to URL paths | HTTPSource with BasicLookupStrategy |
…and URL paths will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier | HTTPSource with ScriptLookupStrategy | |
In S3… | …and the identifiers I use in URLs will correspond predictably to object keys | S3Source with BasicLookupStrategy |
…and object keys will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier | S3Source with ScriptLookupStrategy | |
In Azure Blob Storage… | …and the identifiers I use in URLs will correspond predictably to object keys | AzureBlobStorageSource with BasicLookupStrategy |
…and object keys will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier | AzureBlobStorageSource with ScriptLookupStrategy | |
In a SQL database… | …where they are stored in BLOB columns | JDBCSource |
…where they are stored in PostgreSQL using its Large Object feature | PostgreSQLSource |
FilesystemSource maps URL identifiers to filesystem paths. This is often the most performant source to use, as filesystems tend to offer an appealing combination of high throughput, low latency, and efficient random access.
Two distinct lookup strategies are supported, defined by the source.FilesystemSource.lookup_strategy
configuration option.
BasicLookupStrategy locates images by concatenating an identifier and a pre-defined path prefix and/or suffix. For example, with the following configuration options set:
# Note trailing slash!
source.FilesystemSource.BasicLookupStrategy.path_prefix: /usr/local/images/
source.FilesystemSource.BasicLookupStrategy.path_suffix:
An identifier of image.jpg in the URL will resolve to /usr/local/images/image.jpg.
It's also possible to include a partial path in the identifier using URL-encoded slashes (%2F
) as path separators. subdirectory%2Fimage.jpg in the URL would then resolve to /usr/local/images/subdirectory/image.jpg.
slash_substitute
configuration key.
To prevent arbitrary directory traversal, BasicLookupStrategy will recursively strip out ../, /.., ..\, and \.. from identifiers before resolving the path.
source.FilesystemSource.BasicLookupStrategy.path_prefix
to the deepest possible path. The shallower the path, the more of the filesystem that will be exposed.
Sometimes, BasicLookupStrategy will not offer enough control. Perhaps you want to serve images from multiple filesystems, or perhaps your identifiers are opaque and you need to perform a database or web service request to locate the corresponding images. With this lookup strategy, you can tell FilesystemSource to invoke a delegate method and capture the pathname it returns.
The delegate method, filesystemsource_pathname()
, should return a pathname if available, or nil
if not. Examples follow:
require 'java'
java_import 'org.postgresql.Driver'
java_import 'java.sql.DriverManager'
class CustomDelegate
JDBC_URL = 'jdbc:postgresql://localhost:5432/mydatabase'
JDBC_USER = 'myuser'
JDBC_PASSWORD = 'mypassword'
# By making the connection static, we can avoid reconnecting every time
# the method is called, which would be expensive.
# See: https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/DriverManager.html
@@conn = DriverManager.get_connection(JDBC_URL, JDBC_USER, JDBC_PASSWORD)
def filesystemsource_pathname(options = {})
identifier = context['identifier']
begin
# Note the use of prepared statements, which are safer than
# string concatenation.
sql = 'SELECT pathname FROM images WHERE identifier = ? LIMIT 1'
stmt = @@conn.prepare_statement(sql)
stmt.set_string(1, identifier)
results = stmt.execute_query
results.next
pathname = results.getString(1)
return pathname.present? ? pathname : nil
ensure
stmt&.close
end
end
end
Note that several common Ruby database libraries (like the mysql and pgsql gems) use native extensions. These won't work in JRuby. Instead, the course of action above is to use the JDBC API via the JRuby-Java bridge. For this to work, a JDBC driver for your database must be available on the Java classpath, and referenced in a java_import
statement.
This very simple imaginary web service returns a pathname in the response body when an image exists, and an empty response body if not.
require 'net/http'
require 'cgi'
class CustomDelegate
def filesystemsource_pathname(options = {})
identifier = context['identifier']
uri = 'http://example.org/webservice/' + CGI.escape(identifier)
uri = URI.parse(uri)
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
return nil if response.code.to_i >= 400
response.body.present? ? response.body.strip : nil
end
end
Like all sources, FilesystemSource needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:
HTTPSource maps URL identifiers to HTTP or HTTPS resources, for retrieving images hosted on a web server.
HTTPSource supports two distinct lookup strategies, defined by the source.HTTPSource.lookup_strategy
configuration option.
BasicLookupStrategy locates images by concatenating an identifier with a pre-defined URL prefix and/or suffix. For example, with the following configuration options set:
# Note trailing slash!
source.HTTPSource.BasicLookupStrategy.url_prefix: http://example.org/images/
source.HTTPSource.BasicLookupStrategy.url_suffix:
An identifier of image.jpg in the URL will resolve to http://example.org/images/image.jpg.
A partial path can be included in the identifier by URL-encoding the path separator slashes (%2F
). subpath%2Fimage.jpg in the URL would then resolve to http://example.org/images/subpath/image.jpg.
It's also possible to use a full URL as an identifier by leaving both of the above keys blank. In that case, an identifier of http%3A%2F%2Fexample.org%2Fimages%2Fimage.jpg in the URL will resolve to http://example.org/images/image.jpg.
slash_substitute
configuration key.
Sometimes, BasicLookupStrategy will not offer enough control. Perhaps you want to serve images from multiple URLs, or perhaps your identifiers are opaque and you need to run a database or web service request to locate them. With this lookup strategy, you can tell HTTPSource to invoke the httpsource_resource_info()
delegate method and capture the request info (URL and optionally authentication credentials and/or request headers) it returns.
See the FilesystemSource ScriptLookupStrategy section for examples of similar methods.
While proceeding through the client request fulfillment flow, this source issues the following server requests:
HTTPSource.BasicLookupStrategy.send_head_requests
is set to true
, or the delegate method returns true
for the equivalent key, a HEAD
request. Otherwise, a ranged GET
request specifying a small range of the beginning of the resource.HEAD
request was sent:
GET
GET
requestsGET
to retrieve the full image bytesGET
to retrieve the full image bytesHTTP Basic authentication is supported.
source.HTTPSource.BasicLookupStrategy.auth.basic.username
and source.HTTPSource.BasicLookupStrategy.auth.basic.secret
configuration keys.Like all sources, HTTPSource needs to be able to figure out the format of a source image before it can be served. It uses the strategy below to do this.
HEAD
response contains a Content-Type
header with a recognized value that is specific enough (not application/octet-stream
, for example), a format is inferred from that.HEAD
response contains an Accept-Ranges: bytes
header, a GET
request is sent containing a Range
header specifying a small range of data from the beginning of the resource, and a format is inferred from the magic bytes in the response entity.This source supports random access by requesting small chunks of image data as needed, as opposed to all of it. This may improve efficiency—possibly massively—when reading small portions of large images in certain formats (see below). Conversely, it may reduce efficiency when reading large portions of images.
In order for this technique to work:
source.HTTPSource.chunking.enabled
configuration key must be set to true
;Range
header, as advertised by the presence of an Accept-Ranges: bytes
header in a HEAD
response;See Plugins.