This week was full of work and finally we got to the point where simple analysis is possible. For last week I applied changes mentioned in the proposal to enable easy use of mallet package.

The functions train and predict for topic model had been implemented. Also, the visualization techniques are available at the repository. One can see the structure of words for different topics and appropriate wordclouds.

Wordcloud

 

The pictures shows the wordcloud of one of the topics the most probably created based on the romance books

network

and the network of words within different topics.

If you want to try how my package works at the moment take a look at first tests provided in:

This week I plan to implement some more basic transformation functions. The idea is to prepare simillar interface for the tmCorpus as there exists for VCorpus. Functions such as tm_map, tm_filter, and tm_reduce are very well designed and users are used to them. This is the main reason for not changing them.

Also this week I plan to have look at the other packages thet were not mentioned in the proposal. The plan is to extend the package possibilities during the project time as much as it is possible.

 

At last textmining has already more than 100 commits 🙂