A blog about ideas relating to philoinformatics (or at least that have something to do with computer science or philosophy)

Friday, July 25, 2008

Putting my old philosophy papers online

I'm going to gather all the old philosophy papers I've written over the years, which isn't very many, and post them with my current opinions of the papers. I don't know how useful or interesting this will be, but it might be fun.

Wednesday, July 16, 2008

Types of "trust" needed in semantic web

Anyone (or anything) can potentially publish RDF data. In order to know whether you should use their data, there are many issues of trust. There are (at least) two important categories of trust when it comes to the semantic web.

1) Personal Trust
Personal trust amounts to trusting that the creator of the data has the right motives. Spam is a good example. This kind of trust will become more important as the semantic web grows, but isn't a major problem yet.

2) Reliability Trust
Being able to trust data found on the semantic web requires knowing how reliable the data is. Was it created by experts? Is there a peer review mechanism? Was it created by automatic natural language processing? Currently this kind of trust is also not too important, but I believe it will become very important much faster than personal trust. We already know how to solve the reliability trust issues, with more metadata. We will need data about how the data was generated. (Note that implementing this allows more options for lying about data, and so doesn't really help with personal trust issues).

Thursday, July 3, 2008

Getting rough data into the semantic web

It's nice when regular relational databases, excel spreadsheets, and other sources that people have created can be attached to the semantic web. But there are problems with the way most data is recorded, especially for data about time. In short, the problem is that people round values. This may seem like a trivial unimportant problem, but I don't think so if you want to be able to use the rough data that makes up most of the world's data. Here are the issues:

1) Datatype granularity. Datatypes allow for a wide range of possible values, but not all values. We may need to know in some situations whether the data value written is an exact match to the value that was intended. For instance, 1/3 can't be represented fully as a double and 3pm today can't be represented as a date without dropping the 3pm part.

2) Granularity used vs datatype granularity. This is a much bigger problem than (1). People often write things to a granularity that is much less fine grained than the datatype granularity. For instance, you may be recording distance to 2 decimal places and storing it as a float. This needs to be taken into account for situations where values are compared. We don't want to say that two things have the same height just because they are almost the same height. Also, people round times to the minute, 5 minutes, 10 minutes, 15 minutes, hour, and many other ways. Are they rounding down? Are they rounding to the nearest? Some other recording method?

An ontology that has commonly used recording methods to describe the data recording process would allow more justified inferences about the data to be made. This would allow a higher level of trust (in the second sense, explained here) when using the data which is going to be of great importance as the semantic web grows.