Resolve http URI for azure data lake
I am creating an Azure Data Lake storage integration for Label Studio. The way the Django backend works is that it resolves cloud storages integrations in away that they can be resolved to http addresses for pre-signed objects like images. This allows the backend to read all files despite the cloud storage provider. However, I am not able to successfully create the pre-signed urls for Azure data lake, any suggestions?
The test that should succeed is shown below:
- name: stage
request:
method: GET
url: '{django_live_url}/api/projects/{project_pk}/next'
response:
json:
data:
image_url: !re_match "/tasks/\\d+/presign/\\?fileuri=YXp1cmUtYmxvYjovL3B5dGVzdC1henVyZS1pbWFnZXMvYWJj"
status_code: 200
This test above fails, the endpoint returns "azure-spi://<path/to/file>" instead of "/tasks/\d+/presign/?fileuri=YX..."
This means I am not resolving uris as the backend expects. The issue here is that I dont knownif http uri are supported by Azure Data Lake with a service principal authentication. See the resolve_uri function I am trying in the class below:
class AzureServicePrincipalImportStorageBase(AzureServicePrincipalStorageMixin, ImportStorage):
url_scheme = 'azure_spi'
presign = models.BooleanField(_('presign'), default=True, help_text='Generate presigned URLs')
presign_ttl = models.PositiveSmallIntegerField(
_('presign_ttl'), default=1, help_text='Presigned URLs TTL (in minutes)'
)
(…)
def generate_http_url(self, url):
match = re.match(AZURE_URL_PATTERN, url)
if match:
match_dict = match.groupdict()
sas_token = self.get_sas_token(match_dict['blob_name'])
url = f"{self.get_account_url()}/{self.container}/{match_dict['blob_name']}?{sas_token}"
return url
(…)
def resolve_uri(self, uri, task=None):
# list of objects
if isinstance(uri, list):
resolved = []
for item in uri:
result = self.resolve_uri(item, task)
resolved.append(result if result else item)
return resolved
# dict of objects
elif isinstance(uri, dict):
resolved = {}
for key in uri.keys():
result = self.resolve_uri(uri[key], task)
resolved[key] = result if result else uri[key]
return resolved
elif isinstance(uri, str):
try:
# extract uri first from task data
if self.presign and task is not None:
sig = urlparse(uri)
if sig.query != '':
return uri
# resolve uri to url using storages
http_url = self.generate_http_url(uri)
return http_url
except Exception:
logger.info(f"Can't resolve URI={uri}", exc_info=True)
class Meta:
abstract = True