Task #18090

CKAN: implement a patch (and update the provisioning) to solve the Bug #18070

Added by Francesco Mangiacrapa 2 months ago. Updated 2 months ago.

Status:ClosedStart date:Nov 15, 2019
Priority:NormalDue date:
Assignee:Francesco Mangiacrapa% Done:

100%

Category:Other
Sprint:Data Catalogue
Milestones:
Duration:

Description

We need to implement and release a patch to fix the CKAN issue reported at #18070


Related issues

Related to StocksAndFisheriesKB - Bug #18070: Tags containing a double space (maybe multiple spaces) ar... Closed Nov 14, 2019

History

#1 Updated by Francesco Mangiacrapa 2 months ago

  • Related to Bug #18070: Tags containing a double space (maybe multiple spaces) are not searchable added

#2 Updated by Francesco Mangiacrapa 2 months ago

  • % Done changed from 0 to 50

By debugging the source code of CKAN, I found the bug in the following snippet (into file https://github.com/ckan/ckan/blob/2.6/ckan/logic/action/get.py#L1841):

1. if capacity_fq:
2.    fq = ' '.join(p for p in fq.split() if 'capacity:' not in p)
3.    data_dict['fq'] = capacity_fq + ' ' + fq

In my understanding, the line 2:

fq = ' '.join(p for p in fq.split() if 'capacity:' not in p)

should ignore the parameter 'capacity:' and return the remaining ones (in the fq varaiable)

e.g. fq = 'capacity:"public" tags:"Code 21.3.P.s System FAO Name Atlantic Northwest  21.3.P.s" +dataset_type:dataset -dataset_type:harvest'

Should return: 

   tags:"Code 21.3.P.s System FAO Name Atlantic Northwest  21.3.P.s" +dataset_type:dataset -dataset_type:harvest'

But the code splitting on any "space" char removes the multiple spaces altering the query.

in fact, it returns (removing a 'space' char between 'Northwest' and '21.3.P.s'):

   tags:"Code 21.3.P.s System FAO Name Atlantic Northwest 21.3.P.s" +dataset_type:dataset -dataset_type:harvest

I'm going to replace it with this patch:

if 'capacity:' in fq:
   capacity_index = fq.find('capacity:')
   the_capacity_parameter = fq[capacity_index:].split()[0]
   fq = fq.replace(the_capacity_parameter, '')

that returns correctly:

tags:"Code 21.3.P.s System FAO Name Atlantic Northwest  21.3.P.s" +dataset_type:dataset -dataset_type:harvest

#3 Updated by Francesco Mangiacrapa 2 months ago

  • % Done changed from 50 to 100

I just created the following TASK into d4science-ghn-cluster/roles/ckan/tasks/main.yml

- name: Apply a patch that fixing the filtering for tags containing double or multiple spaces. See ticket 18090
  patch: src=get.py-20191122.diff basedir=/usr/lib/ckan/default/src state=present strip=0
  notify: apache2 reload

and created the patching file (roles/ckan/files/get.py-20191122.diff) via diff command:

--- ckan/ckan/logic/action/get.py.orig  2019-11-22 17:21:26.145837799 +0100
+++ ckan/ckan/logic/action/get.py   2019-11-22 17:19:52.621998269 +0100
@@ -1849,11 +1849,23 @@
                 capacity_fq = '({0} OR creator_user_id:({1}))'.format(
                     capacity_fq,
                     authz.get_user_id_for_username(user))
-
+        
         if capacity_fq:
-            fq = ' '.join(p for p in fq.split() if 'capacity:' not in p)
+            #fq = ' '.join(p for p in fq.split() if 'capacity:' not in p)
+            #data_dict['fq'] = fq + ' ' + capacity_fq
+            if 'capacity:' in fq:
+                log.info('Starting patch #18070 to replace capacity and fix searching tag with multiple spaces')
+                capacity_index = fq.find('capacity:')
+                #log.debug('capacity_index: ' + str(capacity_index))
+                the_capacity_parameter = fq[capacity_index:].split()[0]
+                #log.debug('split_to_capacity: ' + str(the_capacity_parameter))
+                fq = fq.replace(the_capacity_parameter, '')
+               
+            #log.debug("the fq after: "+str(fq))
             data_dict['fq'] = capacity_fq + ' ' + fq

+        log.debug("data_dict['fq']: "+str(data_dict['fq']))
+
         fq = data_dict.get('fq', '')
         if include_drafts:
             user_id = authz.get_user_id_for_username(user, allow_none=True)

I tried to run the playbook for applying the patch to ckan-d, ckan.pre and ckan-grsf.pre and it worked fine. The patch was applied to listed CKANs

Just for testing, I created the dataset https://ckan.pre.d4science.org/dataset?tags=a+tag+with+++++N-spaces containing a tag with multiple spaces and the CKAN filtering now works well.

Next actions:
1. I'm going to push on master the changes for the playbook...
2. We can schedule the upgrade of production CKANs

#4 Updated by Francesco Mangiacrapa 2 months ago

  • Status changed from In Progress to Closed

I just released on production (for GRSF-ADMIN and GRSF) the patch implemented to solve the Bug #18070

Also available in: Atom PDF