A blog about ideas relating to philoinformatics (or at least that have something to do with computer science or philosophy)

Saturday, December 11, 2010

Philoinformatics and Categories of Informatics

How does philoinformatics relate to general informatics?
I will once again answer this using MS Paint. I think informatics (information science) can be usefully looked at as a kind of fan-like spectrum from general informatics to specific informatics. On the pointy handle end, you have fundamental informatics. As you move along to the edges of the fan, you find general domain informatics. At the edges on the right you have specific informatics fields such as bioinformatics, socioinformatics, and philoinformatics. Those are the grey and beige slices.

You can think of any very specific topic being placed on the edge of the fan under the title of "Informatics of X" or "X informatics" or, if you're lucky enough to have a nice prefix representing the topic, even "Xoinformatics"!

Domain Informatics
Just taking a look at the wikipedia page on informatics begs for some subcategories to help organize the discipline. The problems that are solved by the same mechanism in multiple specific informatics disciplines are more appropriately put deeper (to the left) in this fan picture, in the direction of general fundamental informatics. This is the realm of domain informatics. Domain informatics is arguably the most interesting area of informatics. Fundamental informatics is quite stable and almost completely content-neutral. While most advances in a specific informatics domain can usually be generalized to a certain point under certain conditions. I'd put things like information entropy, communication channel theory, cloud computing, and generic encryption issues in the 'fundamental informatics' category. In order to explain where philoinformatics lies in this picture, I'm going to try to identify and categorize the domain informatics.

Qualitative vs Quantitative
I think a major distinction to make when categorizing the growing amount of domain informatics is between qualitative and quantitative content. All disciplines, of course, need to deal with both quantitative and qualitative data, but some disciplines (like physics) have quantitative measurement at their cores, while other disciplines (like history) have qualitative reports and observations at their cores.

A Third Q?
Abstract disciplines like philosophy, law, math, economics, and computation, which are "removed" in a sense from direct empirical observation are interesting cases. They all seem to allow for more rigid models than qualitative observations but are generally not amenable to numerical models in the way quantitative measurements are. Unfortunately I can't think of an appropriate catchy word that starts with a 'Q' to add to the Quantity/Quality (false) dichotomy. But I think we can roughly partition all of domain informatics into Feature, Model, and Measurement Informatics. These are the yellow, blue, and red parts of my beautiful map of informatics above.

Categorizing Philoinformatics

Content in philosophy is published in chunks on the paper and book level. Some of these papers can get heavy in symbols but generally we're talking about free text and almost does a paper get heavy on numbers. Philoinformatics to handle this traditional form of philosophy becomes largely encompassed by general Publishing 2.0 initiatives which is a part of domain informatics. Registries of philosophers, registries of papers, and construing bibliographies as dereferencable (aka "followable") URIs are not unique to philosophical publications. This initiative involves simple feature informatics (by which I mean 'simple features' not 'simple task'). It's also a task that is extrinsic to philosophy in the sense that it is neutral to the content.

The more radical goal of philoinformatics that I mentioned in my philoinformatics manifesto draft involves cracking into the content itself whether by extracting from traditional publications or inventing new types of publication. Much of this content will involve trying to serialize identified ideas, concepts, and definitions that would be only available as unstructured freeform text in regular publications. As important as this is, even these items are somewhat general in that they are going to be used in all kinds of publications. But I should stress that these are the kinds of things that are currently rarely captured in a formal machine-readable kind of form, and would be a major enhancement to the entire domain.


So what content is unique to philosophy, or at least almost unique? The motivation for distinguishing philoinformatics (and any subdiscipline of general informatics) is that there is a different quality of the content that makes it somewhat unique. The content that is almost unique to philoinformatics is the handling of thought experiments, the free use of 'Xian' where X is any philosopher's name, and possibly the modeling of widespread but uniquely philosophical notions like internalist/externalist, foundational/coherentist, objective/subjective, absolute/relative, contingent/necessary. With a proper foundation with terms and links combining these items with to publications and endorsement and rejection statements, we could start computing over philosophical notions to find general properties of philosophical positions such as hidden inconsistencies, distance from evidence, robust multidirectional support and other relations that could potentially be defined in terms of this foundational data. Basically, traction and then real progress may finally be possible.

Wednesday, December 8, 2010

Sparql on Riak

Graph Data Stores
There are generally two different goals people have in mind when using graph data stores.
  1. Graph-walking from node to node, by following edges in the graph. People usually have social networks or The Social Web in mind.
  2. Dynamic schema or schemaless triplestores. People usually have mass fact databases (aka "knowledgebases") or The Semantic Web in mind.
But under the covers, these two concerns generally overlap as graph data stores.

Riak is a very interesting, open source, homogeneously clustered, document data store. But Riak also supports "links" between documents, which makes it a graph data store as well. These links were designed for the goal of graph-walking. An interesting feature is that this graph-walking is reduced to a series of map-reduce operations so queries are fulfilled in parallel across the cluster.

SPARQL on Riak
SPARQL is a query language that is designed for the triplestores (the 2nd goal) but I see no reason why you couldn't use it for graph data stores in general, at least in theory. So if you can reduce SPARQL queries down to a Riak queries then you automatically get your SPARQL queries reduced down to map-reduce operations. Riak even supports something similar to SPARQL property paths where you can keep intermediate results while following links, so it might not be too difficult reduce most types of SPARQL queries. One concern I have (after the main concern of whether the reduction is possible) is whether Riak can handle billions of tiny "documents", which would essentially just be URIs unless you wanted to store associated documents with each URI.

Infinitely Scalable Triplestore
One goal I would like to see achieved is an infinitely scalable triplestore. Or if "infinite" is too strong of a word, let's say a triplestore that can handle an order of magnitude more triples than the biggest triplestores out there. This SPARQL on Riak proposal might actually be able to pull this off. The query might be unbearably slow, but it should complete reliably even if it takes hours, days, or months. Creating some sort of major plug-in for Apache Hive that handles SPARQL-like queries (Hive currently supports SQL-like queries) might be the more ideal way to build an infinitely scalable distributed triplestore, but doing this would be much more difficult.

Tuesday, December 7, 2010

Subjective Consistency

I've been researching all kinds of data stores (well, actually relational, key-value, and document data stores) and I've become aware of the interesting constraint on distributed data stores known as Brewer's CAP Theorem. The idea is that you can't have Consistency, Availability, and Partition tolerance, simultaneously in any distributed data store. It looks like it's difficult to get complete consistency on a single node (see: Isolation Levels) and it's thought to be impossible to get it on a network scale (because of CAP theorem). This is where "Eventual Consistency" usually comes in, relaxing consistency for availability and partition tolerance.

Interaction-Centric Consistency
Hopefully I can frame my idea properly now that I've confused you with some terminology. My initial thought was: What kind of guarantees can a data store offer if a single user or application talks to the same node in the network? We could call this a "data session" or an "interaction". It's a kind of network transaction idea, looser than a data transaction. Anyways, I wonder if  you could guarantee a stronger level of consistency by using your distributed network in this way. There might be a way to offer an apparent or subjective temporary consistency. Ultimately, the idea is that maybe if we make use of the patterns of access we expect from our users, then we won't need strict distributed consistency in the first place for a good number of applications.

Wednesday, December 1, 2010

Bridging Ontologies: The Key to Scalable Ontology Design

It's been years since I've created an ontology (in the computing/informatics sense) but I'm going to give some advice on creating them anyways. When creating an ontology, it can be helpful to connect it up to other related ontologies. In fact, I think this is a requirement for building the semantic web (taking 'ontology' in a broad sense). You may want to ground your ontology (i.e. connect it to more generic or foundational ontologies; towards upper ontologies) or connect it to well known ontologies increasing the potential usefulness and adoption of your ontology. Whatever the reason, there are benefits in doing so if you want the data that your ontology schematizes to be more easily and automatically reusable. The potential downside is that you are forcing your users to endorse the ontology you're connecting to. So how exactly should one connect their ontology into the ontology ecosystem?

Most ontologies out there seem to me to be part of a stack of ontologies built by a single group of people. The ontologies tend to build directly on top of each other, meaning "lower" ontologies directly reference "upper" ontologies. Since the ontologies are developed by a single organization, it seems to make sense to directly connect to them because the organization (arguably) knows exactly what they are attempting to represent or what they mean. The fact that organizations tend to keep their ontologies rather isolated may be caused by a fear to commit to ontologies they didn't create.

The way ontologies are (or at least should be) developed allows the possibility for changes and updates. To accommodate this, one should develop ontologies with versioning. This way, someone using your ontology won't ever have it change on them and the developers can still maintain and change the ontology by introducing new versions. It's as simple as adding a version number to your ontology's url.

But this brings up the problem we face by directly referencing other ontologies. Let's imagine you have an ontology X that makes reference to another ontology Y and that ontology Y has a newer version available. You're planning on updating a term in X to make reference to essentially the same term but in the newer version of Y to keep X up to date. So you update X to the new version of Y even though it basically hasn't changed its meaning. The role an ontology fulfills is to describe a certain subject or topic and this intrinsic meaning has not changed. Yet you still need to change your ontology. Under these conditions, no matter how much concensus is formed around the accuracy of your ontology, you will never know when it is stable. In fact, this leads to a cascading of updates and changes required by upstream ontologies that reference your ontology, and so on. This is not a distributed web-scale ontology design pattern. We need a way to decouple our ontologies.

So, is there a design pattern we can use to avoid these dangers and burdens of connecting to other ontologies? Can we do better than simply identifying good stable ontologies and directly referencing only those ontologies in our own ontology? Yes!

Introducing: Content vs Bridging Ontologies

The key to scalable ontology design is what I call Bridging Ontologies. You write your intended ontology without referencing other ontologies and then create a separate ontology that is mainly made up of owl:sameAs and rdfs:subClassOf relationships between your terms and the target ontology's terms. I call these ontologies Content Ontologies and Bridging Ontologies, respectively. You only need to update your Bridging Ontology when either the source or target ontology changes. The nature of a Bridging Ontology makes it useless for anyone to reference in their own ontologies, which stops any potential cascade of changes throughout the web of ontologies. Of course users would still need to use the Bridging Ontologies and would likely need to collapse/deflate the owl:sameAs relationships into single terms for most visualizing, processing, or reasoning purposes.

I'll go out on a limb here and say that every ontology anyone creates should be isolated in this manner. The vision then becomes a web of ontologies of small Content Ontology nodes that satisfy specific "semantic roles" and then Bridging Ontology edges definied between these Content Ontologies. Since you don't need to adopt all of the Bridging Ontologies that are built for a Content Ontology, it is much easier to reach concensus on the Content Ontologies and then to pick and choose your Bridging Ontologies, choosing to commit (or not) to exactly how that content fits into the big picture. This allows for decoupled semantics rather than traditional inflexible semantics.

Tuesday, November 30, 2010

The Universal Model

Imagine a computer model of the universe. The past, the present, the future, every event that has or will ever occur. This is The Universal Model.

Clearly there would be some major limitations on the quality this model. You can't have a perfect model of the present stored in the present (not to mention storing the past 13.7 billion years or the next googols of years predicted by our current best cosmological theories). What I have in mind is a kind of massive service where you can ask for information about the universe and retrieve those answers if they are available. The Universal Model is an abstract model that is, at best, partly built, as needed, in order to fulfill a query about it.

Let me try to flesh the idea out a little more. Imagine any system that needs to "answer a question" of some sort. Maybe you're typing a question into a search box. Maybe you're "asking" Google Earth to display a useful image of downtown Seattle. Maybe you're checking the weather for next weekend. Now, imagine a system that can fulfill the query in terms of questions about the way the universe is, plus some processing. In other words, imagine converting or compiling down the question into explicit questions about the universe. For Google Earth this could amount to rendering the image of what would be seen from a specific point and orientation above Seattle, right now. For the weather, you need to look at the 4D volume of space above the location you're at and from Saturday morning until Sunday evening. You'd then need to process the physical description to extract the amount of cloud and rain that's present, and then convert that into a label saying "sunny" or "partly cloudy" based on the context.

Now, again, since we can't store the whole universe, we need methods to fulfill the queries about The Universal Model. This could be done in a number of ways. If only a small subsection of the model is required, which would be the case in the vast majority of queries, then you could invoke specialized services designed to answer queries of that type. Queries about the nature of the weather on the weekend could be fulfilled by invoking weather models, for example.

Consider an analogy. When one wants to automatically translate a document from one language to another, one method would be to build a language-agnostic conceptual representation of the meaning expressed by the sentence and then to express that conceptualization in the other language. This solution would be extremely powerful because it preserves all relevant information and is scalable. (For any new language, you only need to be able to map it to The Language Model and then express that, in any language you want.) The downside of this approach is that we haven't been able to achieve any kind of mapping like this, partly because we don't know exactly what The Language Model would look like and because language is very context-dependent. Instead we build lower level representations, but not as "low" as The Language Model and then take a shortcut on that level of representation.

This is what would need to be done with The Universal Model as well. We take shortcuts while "translating" our query into an answer. This is a general problem for any sort of "conversion".
(Forgive the MS Paint). The boxes are representations of queries (red) and answers (green). As you get lower in layers, the representations are more abstract, more powerful, and more fully answerable. Each arrow is a conversion. The horizontal arrows are hacky conversions that skip the lower more ideal, powerful, but unobtainable layers and conversions. For both ideal models, we get as close as we can to the ideal, so that we can use "hacks" and shortcuts and ingenious tricks to jump to the other side of the gap. For the NLP example, the top layer could start with a document in spanish and end with a document in english. The bottom layer would be The Language Model.

If, like me, you believe that all propositions are made true or false by the universe, then in the case of The Universal Model, the top layer would be any coherent question. So even our idealistic Language Model could be seen as a special case "hack", albeit at one of the lower levels of this hypothetical universal question answering device, that has The Universal Model at the bottom. The Universal Model is the most ideal model possible.


That's the idea anyways; I'm not sure it fully holds up; Take it with a grain of salt; etc; etc.
I wanted to get the idea out there rather than dwell on it too much and then end up not posting anything. Please comment and refine and clarify the idea further for me if you find it interesting!

Saturday, October 9, 2010

Installing Node.js + DBSlayer on Windows using VirtualBox

Recently I've been interested in the idea of using server-side javascript because:

  1. I've recently been introduced to jQuery which is an insanely powerful improvement over the javascript I became familiar with in highschool... like 10 years ago.
  2. The idea that I can use my (MVC) Model on the server and client side seems pretty elegant. I'd be able to send objects back and forth using AJAX without having two different implementations of the objects.
  3. The server-side javascript project that sparked my interest, node.js, just looks so easy. Building a web service should take no time at all and deploying the web service looks even easier. You just call:
    node myjavascriptfile.js
Also, using DBSlayer as a JSON encoded HTTP wrapping of MySQL and using JSON-Template for building the View, I can have all serialization done in JSON and all communication over HTTP. Since the templating is uses JSON then there's potential for simply binding services to templates without any transformation required whatsoever. There is also a node.js plugin to make DBSlayer super simple to use with node.js. I haven't actually used any of these peices before, but they all seem to fit together so well that I thought I'd give it a shot.
Unfortunately, installing all of this was more difficult than I had hoped for, especially because I'm developing on a Windows XP laptop and node.js currently only works in Linux. Fortunately, there are two tutorials that have been written for this exact situation, one for getting node.js onto an Ubuntu VM and another for installing DBSlayer on that same VM. But though these tutorials are pretty good, they didn't work perfectly for me. I'll go over the problems I ran into along the way in hopes that someone (possibly my future self) can benefit from the added tips.

A quick note: I'm relatively new to VirtualBox so I made one awkward mistake that must be quite common. I left my virtual Ubuntu installation CD in my virtual CD drive! If it looks like your virtual drive is not being fully used and it says your 3.1GB drive is full when you have a bigger than 3.1G drive then you've probably made a similar mistake. Took me 2 hours to figure that one out. I guess I should consider myself lucky.

The first problem with the first tutorial that I ran into involved OpenSSL. Along with the other few apt-get installs I needed to do a:
sudo apt-get install libssl-dev

When validating my node.js install using the helloworld.js described in the first tutorial, the script didn't work. The error I saw was:

/home/amcknight/node/helloworld.js:4
 res.sendHeader(200, {'Content-Type': 'text/html'});
     ^ 

TypeError: Object #<a ServerResponse> has no method 'sendHeader'
    at Server. (/home/amcknight/node/helloworld.js:4:6)
    at Server.emit (events:27:15)
    at HTTPParser.onIncoming (http:885:14)
    at HTTPParser.onHeadersComplete (http:88:31)
    at Stream.ondata (http:806:22)
    at IOWatcher.callback (net:499:29)
    at node.js:604:9


Instead I used the first hello world code shown on the node.js front page and everything worked out fine.

As for the second part of the tutorial for installing dbslayer, I recommend running

sudo apt-get update
first to avoid any errors.
The error I was getting involved not being able to find a libapr-1.so.0 file. I'm not sure what exactly resolved this problem for me but I eventually started over (and found those two tutorials) and the only difference I can think of is that I used the apt-get update command and installed libssl-dev the second time around. So unfortunately I can't pinpoint the cause, but if it ever happens to me again I'll post a comment about it (or you can).

Other than that, the tutorial should work fine if you fingers are tightly crossed.

Ultimately, these are the relevant commands I ran, in order, and not including the validation steps that are explained on the two tutorial pages. From a clean install of Ubuntu 10.04, taking the steps below should get you up and running.



sudo apt-get install g++
sudo apt-get install git-core
git clone git://github.com/ry/node.git
sudo apt-get install libssl-dev
 

cd node
./configure
make
sudo make install
cd ..
 

sudo apt-get install subversion
 

svn co http://www.dbslayer.org/svn/dbslayer/trunk
dbslayer
sudo apt-get update
sudo apt-get install libapr1-dev
sudo apt-get install libaprutil1-dev
sudo apt-get install libmysqlclient-dev
 

cd dbslayer
./configure
make
sudo make install

sudo apt-get install mysql-server
dbslayer -s localhost -u root -x YOUR_PASSWORD -c void

Saturday, July 17, 2010

Philoinformatics Manifesto

I wrote this draft philoinformatics "manifesto" in April and have been meaning to polish it up and post it.

Yeah... it's July now.

I still intend to clean it up, but I decided to post my draft, as is, just to get it out there for now. Call it the beta version... actually more like alpha. All comments very welcome at this point.

Philoinformatics is a scientific, philosophical, and most of all an engineering discipline with the single goal of radically enhancing philosophy using information systems. If you're thinking about Artificial Intelligence that does philosophy for us then you've got the wrong (but awesome) idea. I'm talking about philosophy being done by people, but being done better, with the whole philosophical process being enhanced from end to end with helpful software made possible by underlying information systems.

We all need to recognize that philosophy as a discipline is devastatingly problematic and requires new life. Philosophy is known to lack proper resolution to patently philosophical problems and is fraught with pervasive disagreement among its experts, among other major problems. Just consider the thousands of man-years of thought put into the Free Will and Determinism topic, for example. Many philosophers do recognize that these and other problems exist and even usually spend some time thinking about them and/or joking about them, but also essentially practice a kind of denial about there existence. Maybe it's because doing philosophy still feels important and individual philosophers are making personal progress in the sense that they are acquiring and refining their own philosophical concepts. Philosophy is a slow and difficult process and philosophers are wasting the vast majority of their time doing what I believe they themselves would agree is a waste of their time if they only had a better way to view the actual landscape of philosophy; the landscape of the content of philosophy.

Despite this heavily pessimistic view of philosophy, I am not advocating for Skeptical Metaphilosophy, the view that philosophy does not have intrinsic value. I also, of course, don't claim to be able to reliably recognize when I or anyone else is specifically wasting their time doing philosophy. What I am advocating is that we build the systems required to show whether there is, where there is, and when there is valuable philosophy to be done. After thousands of years of philosophers spinning their wheels on philosophical problems, I think it's fair to suggest that some focus needs to be put into novel methods for making discipline-wide progress. For those that are able, instead of spending time refining one's own philosophical positions and attempting to make personal philosophical progress, I advocate putting some work into progress for the discipline as a whole.

Speaking of traction, actually imagine a wheel spinning on a road. The wheel spinning is the effort of philosophers and the forward motion is the progress of philosophy. If philosophy isn't making progress because of the nature of the content of philosophy (e.g. "words on holiday" or some other confusion) then that means our wheel is slippery, the problem is intrinsic. If instead philosophy isn't making progress because of the way philosophy is done or the current environment of philosophy (e.g. pervasive repetition of ideas or high barrier to entry for publishing or unknown status of philosophical positions) then that means our road is to slippery, the problem is extrinsic. For progress we need traction, for traction we need both the wheel and the road to have grip. Skeptical Metaphilosophy is the view that our wheel can't be made grippy. Naturalism could be construed as the view that scientifically supported positions are are the only grippy parts of the wheel. Both are about the wheel, philosophy itself. Philoinformatics is an attempt to give the road grip. Set up an environment where philosophy can make all the traction it possibly can. Of course, the wheel could be incurably slippery and giving the road grip won't help, but at least now you know where the grip is missing. In other words, Philoinformatics at bare minimum can provide evidence for or against Skeptical Metaphilosophy.

Philoinformatics is an attempt to face the problems of philosophy head on by building mechanisms for helping people understanding the actual landscape of philosophy. Actually, I intend Philoinformatics to be more general than what I've been advocating. Here's the more general form:

  1) Identify Symptoms
  2) Identify underlying qualities giving rise to the symptoms
  3) Design Systems to modify that quality of philosophy
  4) Develop the Systems

As simple as these steps may sound, all four of these steps are quite difficult. You might also notice that despite what I started out saying, Philoinformatics doesn't necessarily need to radically enhance philosophy. Any enhancement will do and would count as work in philoinformatics. But I stand by the "radically" part of the goal because I think philosophy requires radical enhancement and I hope philoinformatics is an avenue for getting us there.

Tuesday, March 23, 2010

Philoinformatics

I've gone ahead and impulsively bought philoinformatics.com on a surge of possibly-too-much-coffee and currently just have this blog as the main and only page. I renamed the blog to match, at least for now, and intend to keep the blog rolling probably under philoinformatics.com/blog when I get a main page set up. I'm planning on doing some writing for at least the next month or so in order to put together an enticing philoinformatics manifesto/vision (edit: draft) that will hopefully inspire a small community into forming around the idea of enhancing philosophy by building new software. Stay tuned!

Monday, March 15, 2010

Conceptual Space Markup Language (CSML)


I've recently come across a great paper by Benjamin Adams and Martin Raubal called Conceptual Space Markup Language (CSML): Towards the Cognitive Semantic Web. I found their paper interesting on many levels because it lies at the nexus of many rather diverse topics that I’m interested in. CSML directly involves Computational Geometry and the Semantic Web, but indirectly involves the philosophy of meaning, mind, and color. Also, as it grows in popularity, I believe force-based organizational algorithms and neural networks will become a heavily used mechanism for generating and using CSML data. Basically, CSML takes Conceptual Spaces, which are already at an interesting intersection of multiple mind sciences and multiple strands of philosophy, and connects it with multiple threads of engineering and informatics.

Wait.
What are Conceptual Spaces?

Conceptual Spaces are multidimensional spaces made up of quality dimensions. Quality dimensions are basically just any property you can think of that has a (pseudo-) continuous range of values. Think: size, mass, brightness, beauty, craziness, unicornity or anything else that you think you can make sense of on some sort of numerical range. You can now consider points and (convex) shapes within your set of quality dimensions, which will correspond to concepts. Consider the classic-cool-colour-cone example to the lower right. That’s a representation of a conceptual space with quality dimensions of hue, value, and saturation. 

But this isn’t just a fancy mathematical model. Structures like this are in some sense actually “held” by neurons in your brain. You may have heard people talk about "levels of reality" (or "levels of description of reality") such as the physical level, biological level, psychological level, sociological level, etc. Well, the idea is that when you look into the brain, which is (of course) very complex, you can look at it "at different levels" (presumably of granularity in this case). Conceptual Spaces are one of those lesser-known but very useful levels between the neuronal level and the psychological level (and below the symbolic level if you think there is one). Other people have much more comprehensive and informative explanations that I’m not going to try to repeat here. If you’re interested in the multitude of potential philosophical implications check out Paul Chruchland’s State-Space Semantics and Meaning Holism. (He’s also got some good stuff to say about colors outside of the classic-cool-color-cone here.)

CSML describes Conceptual Spaces

CSML, the Conceptual Space Markup Language, is an XML serialization of conceptual spaces. CSML brings a whole new engineering dimension to conceptual spaces which fits into the realm of Semantic Web technologies and is actually analogous to OWL, but with radically different implications. In my previous post, which was actually written in June 2008, I dreamed of a “smooth semantic web” that didn’t always require rigid categorization. CSML looks like a better candidate to handle that kind of data. CSML is specifically designed to handle context-dependent meaning and (relative) similarity of concepts, which are both difficult to handle in OWL. (How would you represent a large squirrel and a tiny planet consistently? What about brightness, beauty, craziness, and unicorniness?) After trying to use a few units ontologies for measurement data (at iCAPTURE for Mark Wilkinson) I also have high hopes that CSML can help simplify the problems on that front.

The most exciting part for me is that conceptual spaces lend themselves to fancy techniques for being automatically generated by way of artificial neural networks and force-based organizational algorithms, which brings in a few more of the theoretical engineering topics I’ve been interested in over the years. Starting with similarity data, force-based (or tension reduction) algorithms could help identify quality dimensions. Neural networks can nicely use quality dimension coordinates as input and also learn to precisely place items into conceptual spaces. I can’t wait to see the tools that will be created to allow for generating, reasoning over, and visualizing CSML data and how they will integrate with existing semantic web technologies and machine learning techniques.

Saturday, March 13, 2010

Towards a Smooth Semantic Web

This is an almost complete draft I wrote almost 2 years ago. I thought I should publish it because it is basically complete, and because my next post will be about the new language and technologies that will make the "smooth semantic web"  a reality. And for the record, 'smooth semantic web' was a provisional name to be changed before publishing.
Here it is:

The Semantic Web is slowly building up and will eventually grow to a critical mass where it will become useful. How useful is another question. Will it revolutionize the web? Web 3.0? Maybe. Maybe not. But it will definitely have a use and an effect. The current standards such as RDF, OWL, SWRL, SPARQL, and SKOS are good, and still have a ton of potential that is growing exponentially as we speak. But these standards are not enough. OWL and SKOS can only capture so much knowledge. They can capture crystalized categorizations of well defined concepts. But much, if not the vast majority, of knowledge is not found in strict categorization and necessary relationships. So what is missing?

People have made attempts to extend OWL in a few different ways to make it cover a wider range of statable knowledge. People have looked into adding probability, non-monotonicty (time), belief. I propose a different addition which, if adopted, would add a prior layer to OWL, just as the above attempts do.

Consider a music ontology. OWL can support certain relationships such as: Rock ISA Genre, Track isPublishedOn CD, etc. But the certain pieces of knowledge cannot be represented because they haven't been crystallized to the point of being definable, especially when you consider the categorization of instances. Imagine the daunting task of deciding whether certain border line songs count as belonging to a specific genre. But consider the task (still daunting, but not as much) of deciding whether a song is "better classified as Rock than Rap" for instance. It may be difficult to say that a song is Rock, or is Rap, but it may be clear which one it is closer to. Much more can be said.

Notice that we are not talking about probabilities here. It isn't semantically correct to say that a song is "more likely" categorized as Rock than Rap. We are saying that if we were to categorize it as either Rock or Rap, it would be better categorized as Rock than Rap.

You may be wondering what the criterion for "better" is or should be. This itself should be represented within a knowledgebase.

So how would we use this knowledge? SWRL and SPARQL can only handle deductive reasoning, so they won't help. There are two options, the way I see it. Take my smooth knowledge and use some kind of classifying step to derive a regular rigid knowledgebase. The other option is to invent smooth reasoners. I think vector space representations and abduction would be important for this step.

Sunday, February 14, 2010

Reviving this blog!

I haven't been blogging for about 18 months and I've decided to give it another shot. So if anyone's following (doubtful at this point) expect a post at least every two weeks or so. I'm hoping to document my attempts at getting into grad schools, engineering tips and tricks I've found, interesting philosophy papers, and odds and ends and ideas that I come across from now on.