Select the search type
  • Site
  • Web
Search
You are here:  Support/Forums
Support

Bring2mind Forums

Replicated databases and indexing
Last Post 05/02/2013 4:41 PM by Peter Donker. 5 Replies.
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
torheskje
New Member
New Member
Posts:17


--
06/23/2010 5:30 PM

We have a database that is replicated (sql databasse replication) once an hour between a shorebased installation and installations on vessels (satellite communication). Each installation has a web server and dmx database and documents stored in files.

Documents are also replicated once an hour (file pull using wget).

Documents are indexed for search when stored. A document that is stored on one location will not be indexed on the other locations.

I assume I have to start the indexing of the whole database (?) once a night or something to have everything indexed.

How do I do that? Or is it possible to index only those documents that have been added in the last X hours/days?

I would appreciate any help on this.

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
06/25/2010 4:04 PM
Hi Tor,

The default indexer is Lucene. This creates its own set of files under DMX/Lucene. You can see those there. I don't think they can be merged so you'd need to reindex a "mother version" of the set of DNN installations and then distribute the Lucene files to the mirrored sites.

Peter
torheskje
New Member
New Member
Posts:17


--
07/29/2010 9:21 AM

Sorry for late reply. I've been on vacation.

Our problem is that all replicating sites are updated with new and new versions of documents and folders, so no site has the complete index.

Is there a way that I can start reindexing the whole repository at a given time each day?

The best solutions would be to create a queue of documents to be indexed/reindexed in the DMX_Entries trigger DMX_EntryUpdatePath. The trigger could catch all document changes on "REPLICATION" and put the entry in a queue for replication, and then run the (re-)indexing of that document on a given time after the replication occured. We would need the delay because the database and the files that are stored on the files system are not replicated simultaneously.

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
08/04/2010 10:26 PM
Hi Tor,

Likewise back in the office now ...

There is a script file in DMX (Admin > Run Scripts) to reindex a whole installation. Basically it just reiterates through all portals and does it for each of them. As the script can be run by calling a page a process might be able to do this. Only, it needs to be logged in as well ... hmm.

Second option: call the API of DMX directly. Not sure if that is an option on your side. It would require programming.

Third option: switch to MS Indexing Service. This is a winadmin challenge if the web server is not the same as the sql server.

Contact me by email if you want to discuss the specifics ...

Peter
torheskje
New Member
New Member
Posts:17


--
04/24/2013 12:25 PM
I have ended up reindexing the sites manually.

This works fine on all sites - reindexes in between 30 and 90 minutes depending on server power/performance

I use the following procedure:

1. delete all files in E:\FLAG\flagweb22\Portals\0\DMX\Lucene\Index
2. log on to web application and start /admin/run script/reindex portal.

One site with the same content as the other sites has now been running for two days.

CPU does not use any resources (99% idle 95% of the time, sqlserver and w3wp.exe are scarcely using any CPU)

Do you have any idea what the cause might be ?

Peter Donker
Veteran Member
Veteran Member
Posts:4536


--
05/02/2013 4:41 PM
Hi Tor,

Difficult to say. But my gut feeling is that it either as to do with the storage provider or it has to do with the iFilters. In the script every document gets downloaded (so if you are storing on S3 and you have low bandwidth this can take some time), fed into the iFilter and then the resulting content text gets fed to Lucene. Now. I've noticed that the PDF iFilter can take quite some time to return text. I can't say why. Just that it does.

Just thinking about your situation again: would it make sense to move to Indexing Service based search? It's a b*tch to set up, but it doesn't need reindexing.

Peter
You are not authorized to post a reply.