Speech to Text Software

We’ve been contemplating buying the Dragon NaturallySpeaking software so that The Missus can “write” content for our new project while holding a baby.  It would also be a lot easier for her to dictate everything–she’s an excellent speaker–as fast as she want to go.  I can go back in later and edit if needed, but the reviews I’ve read of NaturallySpeaking say it does a good job of punctuating on its own.

I’m also thinking it would be great for my dad.  He’s losing his eyesight rapidly, and can hardly read a computer screen anymore.  I was wondering if anyone had any experience with this software.  A guy I used to work with swears by it, just because he can’t type.  It says it can handle over 100 words per minute of human speech.  Being from the South, that should be plenty for my family and me.  🙂

No items matching your keywords were found.

Sort The Viewers, Not The Movies

My buddy IB sent this article to me…very interesting.  Netflix is running a contest for data crunchers and offering $1M to anyone (or any team) that can beat their current recommendation system by 10%.  One of the leaders is a psychologist working by himself who is looking less at raw data and more at human nature.

One such phenomenon is the anchoring effect, a problem endemic to any numerical rating scheme. If a customer watches three movies in a row that merit four stars — say, the Star Wars trilogy — and then sees one that’s a bit better — say, Blade Runner — they’ll likely give the last movie five stars. But if they started the week with one-star stinkers like the Star Wars prequels, Blade Runner might get only a 4 or even a 3. Anchoring suggests that rating systems need to take account of inertia — a user who has recently given a lot of above-average ratings is likely to continue to do so.

I think this guy is onto something, and I’d like to see this move a step further.  Associating movies using k nearest neighbor is relatively straightforward, but attacking the other side of the equation (the viewer) is a lot tougher.  Here’s an example…

“The Outlaw Josie Wales” is one of my favorite movies, but that doesn’t mean that an algorithm could spit out a bunch of westerns and give me something I like.  Clint Eastwood movies wouldn’t do it either, but it would be a little closer.  The real way to suggest movies for me would be to look at some other factors that aren’t so obvious.  You need to be able to draw conclusions from my other favorites–“Fight Club”, “Pulp Fiction”, “Smoky and the Bandit”, and “Swingers”.  You may peg all of these as “guy movies”, but that doesn’t mean I’m going to like “Gladiator”.  In fact, I hated “Gladiator”.  A movie like “Thelma and Louise” is a much better suggestion for me than “Gladiator”.  Why?  Because it is much more quotable, and that’s something my favorite movies suggest that I like.

Just an example, but that’s the direction we’re going.  In order to make a powerful suggester for anything (books, movies, music, raincoats, etc.), it is now necessary to consider the individual making the purchase instead of a one-size-fits all approach.  How else can you help a guy like me who hates sci-fi but loved “The Matrix” and can’t stand to watch horror flicks but has seen “Scream” several times?

I’m oversimplifying it a bit, but this is a very difficult problem.  You’re basically tasked with generalizing a solution which has to consider literally millions of individual problems within the problem.  It’s very tough to quantify so many parameters in so many dimensions.

What amazes me most is that this is such a simple task for us to complete in our heads.  Computers are still so far behind us in our ability to do something as simple as watch a movie and think to ourselves, “That movie sucked, but my buddy really likes movies like this…I think I’ll suggest it to him.”

Yeah, What He Said

The other day I posted a meandering attempt at not ranting about information technology and the manufacturing sector. Today, Seth Godin wrote a post about basically the same thing. The difference is, his post actually makes some sense.

Talent is too smart to stay long at a company that wants it to be a cog in a machine. Great companies want and need talent, but they have to work for it.

Stop whatever you are doing and read the whole post. If you don’t read Seth, you probably should. Whether you are the guy running the show or the guy who sweeps the floors at night, he has great insight delivered daily for free.

And here’s a nice bit of irony for you…Seth Godin’s blog (for whatever reason) is blocked by our corporate IS department. Luckily, the concept of RSS feeds and readers hasn’t trickled down to them yet, so we can still read whatever we want through them.

Give them a few years and they’ll get Google Reader blocked as well.

Yahoo! With a Huge Web Hosting Announcement

The other day I mentioned Microsoft’s bid to buy Yahoo!

Today, Yahoo! made a pretty interesting announcement…Yahoo! Web Hosting
now provides UNLIMITED disk space and UNLIMITED bandwidth for less than $12. That means that those videos you’ve been uploading to YouTube (owned by Google) because they are big and take up bandwidth can now be hosted cheaply and you can keep your assets for yourself.

They are even registering your domain name for free, plus unlimited MySQL databases and email addresses. If you’ve been thinking about starting a blog or getting a site for your small business, this looks like a sweet deal.

My Time Off The Grid

When I go off, I go waaaaaay off.

Very light blogging the last few days, but I’ve written a ton of code. All of it is really good, functionally, but nothing I’m prepared to share with the world. See, I have a tendency to get pretty sloppy with my programming unless I know exactly where I’m going before I start. In this case, I was always fixing this “one more thing”, and now I’ve got some cleaning up to do.

It’s not exactly spaghetti, but calling it ravioli would be fair. The reason I like to start any project with a definite roadmap of where I’m going is to avoid this exact situation.

Sun Buys MySQL

I don’t write about tech stuff here too often, but since this blog, and most likely yours*, is backended by MySQL, it’s relevant. MySQL’s business model works like this–it’s free (as in beer) to use, but enterprise level users do pay the company for support. That’s what makes it so great for the web. People can back end blogs, content management systems, bulletin boards, and just about anything else they can imagine using freely available open-source tools. In fact, there’s even an acronym for the most commonly used tools working together (LAMP–Linux, Apache, MySQL, and PHP). For the end user, more than likely nothing will change.

So why does it matter to us that Sun now owns it? Because the fact that Sun owns it means that Google, Microsoft, and Oracle don’t own it.

Story

*HM, I know you do your own blog engining…mad props.

Part II (of many) on SEO, Google, and Content–Technology Moves, Build For Change

First of all, I can’t take credit for all of these ideas. Lot’s of them have been borrowed from guys like Steve Pavlina who are basically saying the same thing I am.

One of the more important points Steve makes is that your content should be timeless. What he means by this is that if your content is only pertinent only to what is going on today, there’s not much reason for people to want to look at it tomorrow. This is especially important as you try to build momentum for your site traffic over time.

Early on, your site will not be highly listed on any engines. With Google, you’ll be stuck in the “sandbox” for quite a while. While you may be providing great content that is extremely relevant for the day, week, or month it is published, your potential readers will never find it, at least from a search engine. Down the road, you may be lucky enough to be bumped up to a high ranking for the search terms, but it’s likely that no one will be searching for it.

Of course, there are exceptions. For instance, I maintain a site for my rugby club (www.knoxvillerugby.com) that contains scores and information about the club for the last few years. While the score of last Saturday’s match will get the majority of its traffic in the week following the match, there is a good chance that old guys who want to relive the glory days will one day come back to our site, possibly through a search engine, and read about what happened way back when. But, like I said, the majority of the traffic is going to come in the first week. This traffic is not search engine driven. It is driven by the fact that the site is reliable and updated in a timely manner. Not only do members of our club check our site regularly, but members of other clubs whose place in the league standings are tied to the results of our match check it as well.

So what do I mean by “build for change”? One could take that statement as a call to build in scalability and flexibility. While these are certainly important attributes to consider for your site, this actually isn’t what I’m talking about at all. The basic idea of what I’m saying is, don’t focus your efforts on search engines. Don’t focus on trying to get people to link to you. Don’t focus your energy on driving loads of traffic to your site today.

Focus on providing your customers with exactly what they want–good content that they want to come back for. All the rest will follow.

The biggest problem with relying on technology to drive your traffic is that technology is always changing. In 1999-2000, the .com boom, I was doing some work for a company who was selling its services to European companies to boost their rankings on search engines. Back then, Yahoo! ruled the roost, but they didn’t have nearly the market share that Google has now. People weren’t focused just on getting ranked highly on Yahoo!, but every search engine. We were monitoring rankings on over 100 different search engines as well as checking for links on the highest traffic sites on the web. Our goal was to get our customers rated highly on ALL of these engines. In much the same way that Google’s Page Rank system works now, each customer was assigned an indexed ranking based on their listing in the engines and the number of links to them that existed on high traffic sites.

Not exactly rocket surgery, but useful at the time. What wasn’t foreseen by my employer was the fact that one company was going to come in and basically take over the search industry. I was constantly asking, “what do we do when the situation changes?” I’ve long since parted ways with them, but I can imagine that their customers aren’t very thrilled with their Ask Jeeves rankings being in the top ten if their Google ranking is 97. I’m sure they’ve adjusted their product to account for this, but there are factors they didn’t see coming that I’m not sure they’ve dealt with. I would guess the most difficult problem they had to addres is that not only did the dominant players in the game change, but the technology changed as well.

The way search engines worked has drastically changed since 2000. Search engines are smarter (especially Google). Search engines are better equiped to handle rapidly changing sites. Most importantly, search engines are constantly changing and improving going forward.

Ironically, one of the tasks assigned to me way back then was to develop a “keyword generator”. Literally, those were the specifications I was given–“develop a keyword generator”. Now, my idea of a keyword generator and my boss’s were completely different, and frankly, my idea was a little ahead of its time.

My boss was very disappointed when I proudly showed him my software. He was expecting a tool that prompted the user enter a few keywords, then spat back these same keywords with the <meta> tags around them.

He was actually a little angry when I demonstrated my app that spidered three layers into a site and returned suggestions for keywords based on frequently occurring words and weighted based on the page on which they occurred and their placement on the page.

Which sounds like it more accurately addresses how search engines work nowadays?

The point isn’t that he wanted me to write a tool with very little functionality (and there were a million of these already available). The problem was that he had no inkling that search engines could ever change or evolve and refused to consider it when confronted.

We are facing a movement today that I predict will drastically change the game again. Social networking sites are becoming more popular by the minute as a way to find information. “Rankings” on sites like Digg, Del.icio.us, Reddit, etc. are driven completely by the users. Relevance and quality aren’t being decided by algorithms at all, but by actual people. So when 1,000 people “digg” your site, you better believe that there are thousands of others who are going to sit up and take notice of it.

As more and more people discover these sites and see the value in them, high rankings on these sites will become more and more important. Some people are well aware of this situation and are already coming up with ways to try to “game” these sites by falsifying user recommendations, and they are responding by banning domains that try to beat the system. I think a better approach is to focus on providing good, original content. You will not only increase your chances of finding good, loyal users, but you’ll also have built for the future.

We don’t know for sure what tomorrow will bring in search, social networking, or technologies that are still in their infancy. What we know for sure is that the goal of these technologies is always going to be finding and categorizing the best content out there.

Build quality into your site, and you can rest easy that you’ve also built for change.

Part I (of many) on SEO, Google, and Content

I’ve been reading up a lot lately on Search Engine Optimization (SEO), marketing, monetizing a blog, sandboxes, traffic generation, and blah, blah, blah, blah, blah.

From what I’ve read, what I feel in my gut, and what everything else I’ve done in life has taught me, I’ve come to a pretty simple conclusion–you gotta work for it, at least as far as content driven sites go. And just like everything else, if you put in the hard yards and take care of what you can control, the rest will take care of itself.

I’m doing a little experiment, which I’ll discuss in a seperate post, to determine how much a very targeted SEO strategy can help impact a site that is basically without content. In contrast, I also have this site (not an experiment), with I plan on providing an abundance of relavant, original content that is focused on, well, nothing in particular. As I said, more on that later.

I’m not saying it doesn’t pay to be smart about SEO and to be aware of the existence of search engines. You would be stupid not to use keywords that are relevant (the important word here is relevant) to your site, and it is probably worth your while to do some research into the most common searches that occur for your target market. But in the end, the free market will determine whether or not your site is successful, not Google. Why? Because not only is your site market driven, but Google is market driven itself!

Maybe I’m a simpleton who isn’t looking at all the angles, but here goes…

How Google’s Market Relates to Your Market

Google’s goal is to provide its customers with relevant search results. The reason Google is the top search engine, and the reason everyone wants a high Google ranking, is that it actually does a good job at achieving this goal. People’s trust in Google to give them what they are looking for was brought about by Google’s ability to sort through the junk and provide relavant results. Google’s continued dominance relies on being smart enough to know which sites deliver relevant content and which sites are simply trying to trick the user into visiting the site in hopes of selling them something they aren’t looking for. If Google fails to perform, someone else will jump in and provide this service.

That’s the beauty of the free market–if it is technologically possible, the demand will be met. In fact, the technology actually drives the demand in this case. So Google not only has to worry about providing their users with a quality product right now, but they also have to work to continue to provide a quality product in the future or risk being upended by someone with better technology who does a better job.

In other words, if Google’s search engine is dumb enough for you to trick it placing a crappy, irrelevant, get-rich-quick site high up the rankings, no one will want to use it anymore. If no one uses it anymore, what good is it for you to be ranked highly there? At that point, Google is no longer able to effectively connect you to your market.