Select the search type
  • Site
  • Web
Search
You are here:  Support/Forums
Support

Bring2mind Forums

pdf contents indexing on bulk import
Last Post 09/01/2008 8:02 PM by knappster. 8 Replies.
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
knappster
New Member
New Member
Posts:15


--
08/28/2008 5:45 PM

Hi,

I have the Adobe ifilter set up so that the contents of pdf's are indexed and it works fine when I upload a document.  However, I need to bulk import a load of pdf's using your Import facility but when I do the contents of the pdf's are not indexed.  I have tried running your re-index script afterwards but it makes no difference.  Any idea if I can get the pdf contents indexed with the bulk import facility?

Cheers,

Lee.

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
08/29/2008 9:36 AM
Hi Lee,

Both should index the documents. The same code is run whether it's a regular upload or an import. There must be a difference with the files somewhere. After rexindexing you can still find the old ones that did work?

Peter
knappster
New Member
New Member
Posts:15


--
08/29/2008 10:18 AM

Hi Peter,

Thanks for your reply.  This is what happens.  If I bulk import documents and use Luke to view what has been stored by the document then I can see that nothing has been stored for the tag.  If I then download the same pdf from DMX so that I know I have exactly the same document and upload it on its own, then in Luke the tag is filled.  If I then use the re-index script, both pdf's lose their so I cannot find any file now.

I am using 04.02.03 at the moment if that makes any difference?  I think you were speaking to my colleague about an indexing issue in 04.03.00 so we can't use that at the moment until your fix for that is released.

Cheers,
Lee.

knappster
New Member
New Member
Posts:15


--
08/29/2008 3:22 PM

Hi Peter,

Further to this problem, I have also found that after using the re-indexing script the indexing of a single pdf does not work for about half an hour afterwards...

Also, in my last post, where I have put "tag" it should be "contents tag", looks like the text box removed the word "contents" as it had less than and greater than symbols around it!!

Cheers,
Lee.

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
08/29/2008 4:25 PM
Hi Lee,

Let's try this for a quick fix: recycle the app pool on the server and rename the portals/0/dmx/lucene directory to lucene2 or something (i.e. we're completely resetting Lucene). Then go to your installation and hit reindex. Now the lucene and subfolders should reappear with some content.
In the meantime I'll make a note of this and run a similar test.

Peter
knappster
New Member
New Member
Posts:15


--
08/29/2008 5:30 PM

Hi Peter,

Right, I've narrowed the bug down now.  If you recycle the app pool everything works ok for one time i.e. you can re-index fine and all the pdf's contents are indexed or you can add a single pdf ok and all of that pdf's contents are indexed.  However, if you try to add more than one pdf then the second pdf's contents are not indexed or if you try to add a pdf after re-indexing then the pdf's contents are not indexed.  This can be fixed by recycling the app pool again.  So it seems that the dll that is used to do the pdf indexing is not being released after a pdf is added or the re-index script run.  By the way, I am using iFilter v6 if that makes any difference.  Any ideas much appreciated as most of our documents are pdf's and really need this indexing to work!

Cheers
Lee.

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
09/01/2008 12:18 PM
Hi Lee,
Thanks for the analysis. I'll get right to this. Please contact me on peter at bring2mind.net so I can give you a test version for this problem.
Cheers,
Peter
knappster
New Member
New Member
Posts:15


--
09/01/2008 8:02 PM

Hi Peter,

Just to let you, or anyone else having this problem, know that the error is caused by the Adobe iFilter v6.  If you use v5 then all is ok.

Thanks for you efforts anyway.

Lee.

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
09/04/2008 9:49 AM

Murray has reverted back from Adobe's iFilter 6 to v 5 and solved this issue with that. It appeared the dll did not unload properly. This was indeed a candidate as the error occurred on (rapid) successive indexing of multiple pdfs. And always on the second it would fail. So this does fit logic. So to all looking at this thread: go with v5 instead of 6 if you have this problem.

Technical backgrounder:

Adobe PDF filter crashing the application when it's closed
There are quite a few reports about problems with the Adobe PDF filter v6. See this and this for some examples. I researched this issue for some time, and I believe I found what the problem is. It seems Adobe forgot (or not..) to export the DllCanUnloadNowfunction from their PDFFILT.dll. Since a filter is implemented as a COM object, it should export this function to let COM know when it can unload this library. It seems that this causes problems for C# applications because the .dll is never unloaded, and when it does, it's probably a bit late.
You are not authorized to post a reply.