Select the search type
  • Site
  • Web
Search
You are here:  Support/Forums
Support

Bring2mind Forums

Content search not working for PDFs
Last Post 08/02/2012 11:35 AM by Peter Donker. 8 Replies.
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
psommer
New Member
New Member
Posts:5


--
07/23/2012 9:21 PM
Hello, the search works for the title of a document and it searches the content of simple text files, but it won't search the content of PDFs.

I know you indicated that this means there is an iFilter problem on the server but my server provider says everything is set up:

"I have confirmed that the required iFilters are installed on the server, and that the site is currently running in Full Trust (which is our standard server configuration). If you continue to have issues with the PDF search I suggest consulting Peter at Bring2Mind as he may have some further information on configuration of the module. If there's anything that may be required on the server level let us know and we can look in to getting that done for you."

Here is the search log error message:
"Writer exception
23 11:57:42 E: 0 Exception: no segments* file found in Bring2mind.Lucene.Net.Store.SimpleFSDirectory@C:\inetpub\vhosts\######.###\httpdocs\Portals\0\DMX\Lucene\Index\: files: write.lock
at Bring2mind.Lucene.Net.Index.IndexWriter.Init(Directory d, Analyzer a, Boolean create, Boolean closeDir, IndexDeletionPolicy deletionPolicy, Boolean autoCommit, Int32 maxFieldLength, IndexingChain indexingChain, IndexCommit commit)
at Bring2mind.Lucene.Net.Index.IndexWriter..ctor(Directory d, Analyzer a, Boolean create, IndexDeletionPolicy deletionPolicy, MaxFieldLength mfl)
at Bring2mind.DNN.Modules.DMX.Services.Search.LuceneSearchProvider.LuceneSearchProvider.ᜀ(Int32 A_0)"

Thanks for any help you can provide.

Paul
Kate
New Member
New Member
Posts:29


--
07/24/2012 10:50 PM
I was having the same problem, except no error message. Somebody advised me not only to install Adobe iFilter 9, but also Microsoft Filter Pack 2.

psommer
New Member
New Member
Posts:5


--
07/24/2012 11:49 PM
Thanks for your post Kate, I just had the host check for both filter packs and they had to install the The Microsoft Filter Pack 2.0.

Then I cycled the application pool for my dotnetnuke install and reindexed my DMX installation.

When I tried a search it still didn't pick up any of the content inside the PDFs.

Baby steps I guess. :)
Kate
New Member
New Member
Posts:29


--
07/25/2012 1:05 AM
You know, there may be a setting somewhere that causes this problem that nobody is aware of.

I say that because we had the same problem on our hosted DNN site--where you have no access to the server, the host does all of the maintenance on your site. So our host technical support installed Adboe iFilter and restarted the site, then I reindexed everything --and nothing changed. The search only worked on the contents of text files. It didn't search pdf content and it didn't search the content of Word files either.

We switched to a hosted Virtual machine and installed DNN and all of the iFilters ourselves -- and when we tried searching pdfs we had no problems whatsoever.

So again, I feel like there's something with how DNN is getting set up on some of these hosted sites that is interfering with content search.
Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
07/25/2012 9:36 AM
@psommer: your error message

"Writer exception
23 11:57:42 E: 0 Exception: no segments* file found in Bring2mind.Lucene.Net.Store.SimpleFSDirectory@C:\inetpub\vhosts\######.###\httpdocs\Portals\0\DMX\Lucene\Index\: files: write.lock"

hints of an issue internal to Lucene. Can you try to completely delete the index (portals/0/DMX/Lucene/Index) and reindex? I've not seen Lucene issues persist after a fresh start.

Peter
psommer
New Member
New Member
Posts:5


--
07/25/2012 6:44 PM
Well I deleted the lucine index and reindexed the whole installation but the search is still the same.

The hosting provider even sent me a screen shot to prove that he had the iFilters installed. LOL

I have tried restarting IIS, cycling the application pool, and reindexing many times but nothing seems to make any difference.

However the search log doesn't create that writer exception error message any longer, so thats good.

It there any log for the indexing?
psommer
New Member
New Member
Posts:5


--
07/30/2012 10:51 PM
Is there something I should have the hosting provider double check to make sure everything is set up correctly?

psommer
New Member
New Member
Posts:5


--
08/01/2012 3:52 PM
Is there some other way to get support for DMX? I would like to get this working.

Thanks
Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
08/02/2012 11:35 AM
If the content indexing of plain text files works and of another format doesn't then it can only be down to the following ...

Content indexing uses iFilters to read the contents of proprietary/non-plain-text format files (Word, PDF, etc). To do this it passes to the system the file information and asks the system "please open this file and give me back the textual content". There are 2 points of failure:

1. The iFilter is not installed, is outdated, or erroneous. I.e. the iFilter failed to be instantiated by the system or the iFilter couldn't read the file. I've heard of situations where the Acrobat filter was outdated and could not read a recently produced PDF.

2. The system does not allow ASP.NET code to call iFilters. This is a security issue. I.e. the hosting provider does not allow calling the unmanaged code of query.dll which is a machine level dll. Over the years I've come to the conclusion that in shared hosting this is often the reason of Lucene not indexing file contents.

Peter
You are not authorized to post a reply.