Strict Standards: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'America/New_York' for 'EDT/-4.0/DST' instead in /homepages/20/d268022878/htdocs/forum/viewtopic.php on line 988

Strict Standards: getdate(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'America/New_York' for 'EDT/-4.0/DST' instead in /homepages/20/d268022878/htdocs/forum/viewtopic.php on line 988
[phpBB Debug] PHP Notice: in file /includes/functions.php on line 4284: Cannot modify header information - headers already sent by (output started at /viewtopic.php:988)
[phpBB Debug] PHP Notice: in file /includes/functions.php on line 4286: Cannot modify header information - headers already sent by (output started at /viewtopic.php:988)
[phpBB Debug] PHP Notice: in file /includes/functions.php on line 4287: Cannot modify header information - headers already sent by (output started at /viewtopic.php:988)
[phpBB Debug] PHP Notice: in file /includes/functions.php on line 4288: Cannot modify header information - headers already sent by (output started at /viewtopic.php:988)
FAROO Forum • View topic - Are all words stored to the distributed index?

Are all words stored to the distributed index?

Questions, Problems and solutions

Are all words stored to the distributed index?

Postby TomHH » Wed Dec 22, 2010 12:51 pm

When I look at the index tab in Faroo I sometimes observe that a word is stored to the own peer.
Does this mean that a word is not always stored to the own peer?
TomHH
 
Posts: 40
Joined: Sat Dec 18, 2010 5:37 am

Re: Are all words stored to the distributed index?

Postby Wolf » Thu Dec 23, 2010 5:34 pm

TomHH wrote:When I look at the index tab in Faroo I sometimes observe that a word is stored to the own peer.
Does this mean that a word is not always stored to the own peer?


A word is stored to the local index only, if the user visits a web page containing that word AND if Options / Index / Index web history is enabled.
Web pages indexed by the active crawler are stored to those peers, whose PeerID are closest to the hash of the contained words.
Wolf
Site Admin
 
Posts: 130
Joined: Wed Dec 17, 2008 12:28 pm

Re: Are all words stored to the distributed index?

Postby TomHH » Thu Dec 23, 2010 11:35 pm

Wolf wrote:A word is stored to the local index only, if the user visits a web page containing that word AND if Options / Index / Index web history is enabled.
Web pages indexed by the active crawler are stored to those peers, whose PeerID are closest to the hash of the contained words.

Does this mean that some words stored to my local index will never be stored to the distributed index?
If so, isn't this loss of already collected information which could better be spread to the distributed index?
TomHH
 
Posts: 40
Joined: Sat Dec 18, 2010 5:37 am

Re: Are all words stored to the distributed index?

Postby Wolf » Sun Dec 26, 2010 11:57 pm

TomHH wrote:
Wolf wrote:A word is stored to the local index only, if the user visits a web page containing that word AND if Options / Index / Index web history is enabled.
Web pages indexed by the active crawler are stored to those peers, whose PeerID are closest to the hash of the contained words.

Does this mean that some words stored to my local index will never be stored to the distributed index?
If so, isn't this loss of already collected information which could better be spread to the distributed index?

I guess my explanation was not precise and complete:
A word is stored to the local index only (under the condition), if the user visits a web page containing that word AND if Options / Index / Index web history is enabled.
... But a word is always stored to the distributed index to those peers, whose PeerID are closest to the hash of the contained words.

Words are not always stored to the local index, but they are always stored to the distributed index.
Wolf
Site Admin
 
Posts: 130
Joined: Wed Dec 17, 2008 12:28 pm

Re: Are all words stored to the distributed index?

Postby TomHH » Sat Feb 05, 2011 10:42 am

I'm still confused...

If a page is in the crawler queue eventually gets crawled and the indexed words are stored to the distributed index. In the Index tab of the peer I can see this process and it takes a while until all cached words are sent to other peers. If "Words in cache" is 0 additional pages get crawled, etc.
So the sequence is: crawl page from cralwer queue -> cache -> store to distributed index -> crawl page from cralwer queue -> cache -> ...

On the other hand, if I visit a page (and have the index web history option enabled) words are indexed and immediately written to the local index which is much faster than storing to the distributed index .
From your last post I read that these words are also stored to distributed index.
I would assume a sequence like: Crawl visited page -> cache -> store to local index -> (some delay) store to distributed index

Where can the last part be seen?
Is there any information available how many words in the local index are waiting to be stored to the distributed index?
TomHH
 
Posts: 40
Joined: Sat Dec 18, 2010 5:37 am

Re: Are all words stored to the distributed index?

Postby Wolf » Thu Feb 10, 2011 12:12 pm

TomHH wrote:I would assume a sequence like: Crawl visited page -> cache -> store to local index -> (some delay) store to distributed index
Where can the last part be seen?
Is there any information available how many words in the local index are waiting to be stored to the distributed index?

There are two independent caches for the local and the distributed index. The sequence is:
Crawl -> cache for local index -> store to local index
Crawl -> cache for distributed index -> store to distributed index

The number of words waiting in each of the two caches is displayed separately ("words in cache") in the index tab.
Wolf
Site Admin
 
Posts: 130
Joined: Wed Dec 17, 2008 12:28 pm


Return to Support

Who is online

Users browsing this forum: No registered users and 1 guest

cron