Select the search type
  • Site
  • Web
Search
You are here:  Support/Forums
Support

Bring2mind Forums

search and special character
Last Post 12/22/2010 1:23 PM by Peter Donker. 19 Replies.
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
mangiov
New Member
New Member
Posts:6


--
11/12/2009 1:53 PM

Hi,

we are running into an issue where if the user searches for a word that contains special characters, the search result doesn't return any results.

For example, there is a product called "hi-gloss"; if the user searches for "hi-gloss", no results are returned; if the user enters "hi gloss" (without the minus character) the search works, although there are more documents returned than expected.

Searching on the internet about how Lucene indexes its content, I found a couple of links that explained that some characters should be escaped to be searched correctly.

Am I doing something wrong?

 

Thanks,

vincenzo 

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
11/19/2009 1:11 PM
Hi Vicenzo,

I'll make a note of the escaping.

Peter
Brian Cogswell
New Member
New Member
Posts:16


--
01/13/2010 5:20 PM

Peter,

Just wanted to add a note that I am experiencing the same issue with the Global Attributes.  We are using a document control number that is in the format "OP-##-####".  If I seearch on "OP" it works fine but once I add the hyphen, no results are returned.

Brian

Brian Cogswell
New Member
New Member
Posts:16


--
02/19/2010 8:16 PM

Peter,

This issue shows as being resolved. I have version 05.02.02 and it is still an issue for me.  I have a custom attribute called "Control#".  An example would be "OP-21".  When I search on "OP-21" nothing is returned.  When I search on "OP" that file is returned with a lot of other files.

Brian

 

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
03/15/2010 10:43 AM
Hi Brian,

I'll reopen the ticket and try to reproduce it.

Peter
Brian Cogswell
New Member
New Member
Posts:16


--
04/08/2010 3:15 AM

Peter,

I was hoping this was resolved in 5.02.04 but it looks like having a "-" in a Custom field does not work with the Advanced Search feature.

Brian

 

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
04/12/2010 1:32 PM
Hi Brian,

I'll make a note of it. It is a re-opened ticket.

Peter
Brian Cogswell
New Member
New Member
Posts:16


--
07/15/2010 8:18 PM
Peter,

Any update on when you may be able to tackle this? I saw some reference to this in version 5.02.05 but I still do not get any results returned when there is a '-' in the string. I am currently on 5.02.06 and will upgrade to 5.02.09 but I don't see that issue addressed in the last 3 versions.

Brian
Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
07/21/2010 9:02 PM
Good point Brian. It slipped from the radar. I'll really push it up now.

Peter
Brendon
New Member
New Member
Posts:4


--
12/03/2010 10:06 PM
Hi

Bring2mind.DMX_05.03.03 installed, re-index done

This still seems to be an issue


filenames
12345678
1234_5678
1234-5678
1234 5678
1234+5678

Search: 1234
all returned

Search: 234 or 2345
Nothing returned

search: 1234_5
returns: 1234_5678

search: 1234-5
returns: 1234-5678

Search: 1234-5
returns:1234-5678

"_" "-" are "special"?

Search: 5678
Returns:
1234 5678
1234+5678
DOES NOT return?
1234-5678
1234_5678

Is there a work around? or patch?
We use _ - in our filenames as "separators"


thanks
brendon



Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
12/03/2010 10:31 PM
If this is solvable, it will be solved in 5.3.4

Peter
Brendon
New Member
New Member
Posts:4


--
12/06/2010 8:20 PM
Chance of being "solvable"? 50:50 ; "yes, we can" ; scratches head
or is this purely from the "Priority Minor " status?

ETA to 5.3.4 ?

Just curious, as we have had to teach users to type the prefix in for everything (####-1234). and you know how users are.


Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
12/07/2010 3:36 PM
Hi Brendon,

Lucene does not support searching on parts of a string that are not the beginning. I.e. 12 works, but 23 doesn't. This is a quirk of Lucene, not DMX.

Peter
Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
12/07/2010 3:38 PM
BTW .. the issue of the dash was resolved earlier and DMX 5.3.3 handles these just fine. Your search results illustrated that. 1234-5 does indeed return 1234-5678. I just reverified this.

Peter
Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
12/07/2010 3:58 PM
One more thing. In testing I notice that for attribute values there still is an issue. I'll make a note of that and see if this can be taken up for the major revision of the search logic foreseen for DMX 6. In the current form I have not found a solution to this issue.

Peter
Brendon
New Member
New Member
Posts:4


--
12/10/2010 8:45 PM
Hi Peter

"BTW .. the issue of the dash was resolved earlier and DMX 5.3.3 handles these just fine. Your search results illustrated that. 1234-5 does indeed return 1234-5678. I just reverified this."
agreed that 1234-5 does indeed return 1234-567

Search: 5678
Returns:
1234 5678
1234+5678
DOES NOT return?
1234-5678
1234_5678

its as if the "-" and "_" concatenate the two "word" and then the part of "Lucene does not support searching on parts of a string that are not the beginning" kicks in?
but " " and + treat 5678 as two words and does show up? so "-" and "_" are treated differently from "+" " "

---
Did some investigation

"Lucene does not support searching on parts of a string that are not the beginning. I.e. 12 works, but 23 doesn't. This is a quirk of Lucene, not DMX."
going to the LUCENE index using http://www.getopt.org/luke/webstart.html

Connect to index
SEARCH TAB
ANALYSIS : org.apache.lucene.analysis.SimpleAnalyzer
Default Field: Title
Query Parser: Allow leading * in wildcard queries

Search Expression:
*567* returns everything with 567 in it
567* returns everything with a "word" starting with 567 ie "1234+5678.txt"

so using "Allow leading * in wildcard queries" and using * in the query seems to allow me to "support searching on parts of a string that are not the beginning."

This lead me to "The current QueryParser disallows leading wildcard characters by default."
https://issues.apache.org/jira/browse/LUCENE-1795 dated
LUCENE-1795.patch 2009-08-10 08:52 AM

going to have to check DNN to see what lucene version and patches i have.

Thanks for the assist so far
Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
12/15/2010 3:52 PM
Hi Brendon,

Thanks. I will need to check if this is a later addition or an oversight on my part.

Peter
Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
12/15/2010 3:53 PM
Note we're using Lucene.net which trails Lucene and may be stagnant altogether as a project.
Brendon
New Member
New Member
Posts:4


--
12/20/2010 6:25 PM
Stagnant, maybe just a little, but not dead.

2.9.2 2010-02-26
https://svn.apache.org/repos/asf/lucene/lucene.net/tags/Lucene.Net_2_9_2/src/CHANGES.txt

of interest
======================= Release 2.1.0 2007-02-14 =======================

4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
QueryParser. Default is to disallow them, as before.
(Steven Parkes via Otis Gospodnetic)

Looks like DMX is using a very old version? 2.0.0.4

thanks
Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
12/22/2010 1:23 PM
Hi Brendon,

Thanks for the heads up on that. I'll go and download that. Note I tweaked Lucene myself a bit as I needed to make sure it was doing everything isolated in a DNN portal.

Peter
You are not authorized to post a reply.