Resolve http URI for azure data lake

I am creating an Azure Data Lake storage integration for Label Studio. The way the Django backend works is that it resolves cloud storages integrations in away that they can be resolved to http addresses for pre-signed objects like images. This allows the backend to read all files despite the cloud storage provider. However, I am not able to successfully create the pre-signed urls for Azure data lake, any suggestions?

The test that should succeed is shown below:

- name: stage
  request:
    method: GET
    url: '{django_live_url}/api/projects/{project_pk}/next'
  response:
    json:
      data:
        image_url: !re_match "/tasks/\\d+/presign/\\?fileuri=YXp1cmUtYmxvYjovL3B5dGVzdC1henVyZS1pbWFnZXMvYWJj"
    status_code: 200

This test above fails, the endpoint returns "azure-spi://<path/to/file>" instead of "/tasks/\d+/presign/?fileuri=YX..."

This means I am not resolving uris as the backend expects. The issue here is that I dont knownif http uri are supported by Azure Data Lake with a service principal authentication. See the resolve_uri function I am trying in the class below:

 
class AzureServicePrincipalImportStorageBase(AzureServicePrincipalStorageMixin, ImportStorage):
     url_scheme = 'azure_spi'

     presign = models.BooleanField(_('presign'), default=True, help_text='Generate presigned URLs')
     presign_ttl = models.PositiveSmallIntegerField(
         _('presign_ttl'), default=1, help_text='Presigned URLs TTL (in minutes)'
     )      
(…)
     def generate_http_url(self, url):
         match = re.match(AZURE_URL_PATTERN, url)
         if match:
             match_dict = match.groupdict()
             sas_token = self.get_sas_token(match_dict['blob_name'])
             url = f"{self.get_account_url()}/{self.container}/{match_dict['blob_name']}?{sas_token}"
         return url
(…)
     def resolve_uri(self, uri, task=None):
         #  list of objects
         if isinstance(uri, list):
             resolved = []
             for item in uri:
                 result = self.resolve_uri(item, task)
                 resolved.append(result if result else item)
             return resolved

         # dict of objects
         elif isinstance(uri, dict):
             resolved = {}
             for key in uri.keys():
                 result = self.resolve_uri(uri[key], task)
                 resolved[key] = result if result else uri[key]
             return resolved
         elif isinstance(uri, str):
             try:
                 # extract uri first from task data
                 if self.presign and task is not None:
                     sig = urlparse(uri)
                     if sig.query != '':
                         return uri
                 # resolve uri to url using storages
                 http_url = self.generate_http_url(uri)
                 return http_url

             except Exception:
                 logger.info(f"Can't resolve URI={uri}", exc_info=True)

     class Meta:
         abstract = True

Вернуться на верх