Looking at KDE4 and related technologies, a few things about data storage hit me. Not that I know anything about it, but that does usually not prevent me from trying to be clever about it.
Several parts needs to either index, cache or store data.
- Strigi indexes your harddisk, just a index. No real data
- Akonadi caches your data, especially email/contacts/calendar items, but also in the future probably instant messaging logs, notes and others
- Nepomuk handles and stores tags on files and other data
- Amarok indexes your music collection and stores related info, including statistics
Among these, there is probably many other, but these I just noticed.
How do these apps store their things
As the smart reader has guessed, all these uses different storage methods.
Indexes your data and store it in a clucene on disk format. If you remove your strigi indexes, all you lose is the cpu time used to generate it.
Strigi is not a KDE thing, but the KDE libraries depends on it.
Caches your PIM data and make it available easily for applications. Akonadi does not itself keep the data. Akonadi people has evaluated SQLite, but found it insufficient. Akonadi people also evaluated MySQL/Embedded, but did also have issues with that, so they ended up using a full MySQL server.
Akonadi is not a KDE thing, but it originates from KDE and is used by the KDE Pim libraries.
Handles and stores metadata and tags on files, emails and other data. The nepomuk data is all in RDF format. Nepomuk uses Soprano for storage and soprano has pluggable backends.
One Soprano backend, probably the most used, is a Redland backend, which too slow for effective nepomuk usage, but is a C++ thing. Redland internally can use Berkeley DB or MySQL. The usage in Soprano is Berkeley DB format.
The other Soprano backend is using Sesame2, a java rdf storage. Quite some people are against this because it is java. Internally, it seems that it is using its own ondisk format, but is much more effective than the Redland/BDB backend, and the recommended one to use.
Indexes and plays your music and does statistics over it. The amarok people have evaluated SQLite and found it not good enough for the job. The amarok people are going for MySQL/Embedded for now.
All in all, we have a clucene index, a full MySQL server, a BerkelyDB or custom format and a MySQL/Embedded used. I wonder how much communication between these projects there have been regarding the choice of storages.
Now that I already in the beginning said that I didn’t know enough about what I was talking about, I also feel well suited to make suggestions for the future.
What if …
- Amarok hooked into the Akonadi MySQL database process
- Amarok skipped the concept of databases and let Strigi index the music and stored the metadata and statistics in Nepomuk
- Soprano Redland backend hooked into the Akonadi MySQL database instead of a BerkeleyDB file
That’s it for now, next up, maybe, interesting places to use plasma.