Abstract:The paper introduces the preliminary structuring of preprocessing data stream for web log by Clementine, which implements the following procedures: data cleaning, user identification, session identification and path complementary, etc. In addition, it also provides some auxiliary functions, such as log merging, data auditing, coding specification, associating with external information, etc. Experimental result indicates that web log preprocessing based on Clementine is completely feasible, which lays a foundation for further log mining on the same platform. To some extent, it resolves the problem that web log mining and preprocessing are treated by different tools, thus improving the degree of automation for web log mining.