Category Archives: PHP

AWS DynamoDB For Session Redundancy And Failover

This article will go over what DynamoDB is and how we use it to backup our session data. It allows us to failover from one datacenter (AWS region) to another without losing session data and logging people out of our system.

Amazon Web Services (AWS) DynamoDB

DynamoDB is a cloud NoSQL database hosted by Amazon. You simply create a table and set the read/write capacity you want and Amazon takes care of the rest. No servers to manage or scale. Pretty awesome. Actually it’s so awesome that Amazon DynamoDB is the fastest growing new service in the history of AWS.

What’s is DynamoDB good at?

  • No hassle data store that is performant, scalable, and reliable.
  • Quick reads/writes. You decide the performance you need and pay for that.
  • Relatively cheap.
  • Access data via a key.
  • Store data that is in the 1-10KB range per item (you can store more but it gets expensive).

What’s does DynamoDB suck at?

  • It’s not a relational database. You can’t query it with joins or complex selects.
  • It’s not as fast as memcache (in our experience).
  • For large items or extremely high throughput it can get expensive (compared to running your own memcache service for example).
  • There is no simple way to back up your data (they do have a process by which you can get data to S3 but it’s pretty complicated and involves two other totally separate services from AWS).

Where can you read more about DynamoDB?

Using DynamoDB For Session Backup

In the blog post Scalable Session Handling in PHP Using Amazon DynamoDB they cover how to implement session handling for PHP using DynamoDB. I experimented with this, but what I found was it was too slow for our needs. Our session reads with DynamoDB were taking 20+ms. Our memcache session reads are an order of magnitude faster than that. We also already have memcache session handling implemented and working beautifully.

Why Do We Need To Backup Our Sessions?

For some applications it might be acceptable to log everyone out if you failed from one datacenter to another (heck lots of applications run in only one datacenter and don’t have any failover). Our application is mission critical for our customers – if we’re down they’re losing money. We run completely redundant setups in two different AWS regions on EC2.

If there’s a problem we have to switch customers from one datacenter to the other. Not having the sessions backed up in a way that both datacenters can access would mean everyone using our system would be logged out. This is a real hassle for our users because sometimes the stores are operating with a manager login and associates just have PINs to switch to their profile. If they get logged out they have to have the manager come by and log back in. What if the manager is out on lunch? Out of luck.

We previously stored our session backup in our MySQL database. But this became unscalable as the number of concurrent sessions grew. A few months ago we turned off our MySQL session storage and have been running just on memcache sessions. It improved performance, but it meant we might log everyone out if we had to switch datacenters.

DynamoDB To The Rescue

I looked at a number of different NoSQL type solutions for our session backup. DynamoDB made it to the top of the list because of it’s easy of management, scalability and price.

The basic concept is:

  • Session are stored and read from Memcache (every page hit)
  • Every 15 minutes we write the session to DynamoDB (each session stores its time since last DynamoDB write)
  • If we can’t find a session in Memcache (datacenter failure, or Memcache reboot) we look for it in DynamoDB.
  • Result: Users aren’t logged out if we switch datacenters or reboot Memcache.

DynamoDB PHP Code Samples

I gleaned a lot of this code from AWS blog post on PHP DynamoDB sessions.

readDynamoDB

This reads our session data out of DynamoDB when we need it. We call this from our custom session reading function (see session_set_save_handler)

function readDynamoDB($ses_id)
{
  $this->initDynamoDB();
  $result = '';
  $response = $this->_dynamodb->get_item(
    array( 'TableName' => self::DYNAMODB_TABLE,
           'Key' => array('HashKeyElement' => $this->_dynamodb->attribute($ses_id)),
           'ConsistentRead' => true, )
         );

  $node_name = 'Item';
  if ($response->isOK())
  {
    $item = array();
    // Get the data from the DynamoDB response
    if ($response->body->{$node_name})
    {
      foreach ($response->body->{$node_name}->children() as $key => $value)
      {
        $item[$key] = (string) current($value);
      }
    }
    if (isset($item['expires']) && isset($item['data']))
    {
      // Check the expiration date before using
      if ($item['expires'] > time())
      {
        $result = $item['data'];
      }
      else
      {
        $this->deleteDynamoDB($ses_id);
      }
    }
  }
  return $result;
}

writeDynamoDB

This reads our session data out of DynamoDB when we need it. We call this from our custom session reading function (see session_set_save_handler)

function writeDynamoDB($ses_id,$data,$expire_minutes)
{
  $this->initDynamoDB();
  // Write the session data to DynamoDB
  $response = $this->_dynamodb->put_item(
    array( 'TableName' => self::DYNAMODB_TABLE,
           'Item' => $this->_dynamodb->attributes(
             array( self::DYNAMODB_HASH => $ses_id,
                    'expires' => time() + ($expire_minutes*60),
                    'data' => $data,
                   )
            ),
    )
  );
  return $response->isOK();
}

deleteDynamoDB

This delete our session data from DynamoDB when we are done with it. We call this from our custom session destroy and gc function (see session_set_save_handler)

function deleteDynamoDB($ses_id)
{
  $this->initDynamoDB();
  $delete_options = array( 'TableName' => self::DYNAMODB_TABLE,
                           'Key' => array('HashKeyElement' => $this->_dynamodb->attribute($ses_id)),
                         );
  // Send the delete request to DynamoDB
  $response = $this->_dynamodb->delete_item($delete_options);
  return $response->isOK();
}

Single Sign-On with OpenID and Google Part 2

A Quick Review

Wouldn’t it be nice if we didn’t have to remember so many passwords? With the proliferation of Software as a Service (SaaS) applications on the internet there are more and more things that we need to remember a username and password to access. This is where Single Sign-On (SSO) comes in. Single Sign-On is the idea that you can sign-on with one set of credentials and be signed-on to multiple services all at once.

In Part 1 we covered some of the basics of what Single Sign-On is and how we send end-users off from our site to their Identity Provider to verify their identity.

Getting the Id Back

Once the user has successfully logged in to their Identity Provider and decided to allow our site to know what their identity is, the user will be returned to the call back URL that we specified with the necessary information in browser. We just need to verify it and decide what to do next.

In Part 1 we told the Identity Provider to send the end-user to a script called call_back.php to complete the process of logging in the user. Lets start writing that script:

// See the examples/consumer/common.php file in OpenID Enabled library package.
// This provides some utility functions used below.
require_once 'common.php';
session_start();

// Create a new OpenID consumer to receive our Identity Provider response.
// To create the consumer we need to setup a store for the OpenID information
$store_path = '/tmp/_php_consumer_test';
$store = new Auth_OpenID_FileStore($store_path);
// Once we have a store for the information we need to create a new consumer.
$consumer =& new Auth_OpenID_Consumer($store);
// The GApps_OpenID_Discovery allows the consumer to find openid's for Google Apps.
new GApps_OpenID_Discovery($consumer);

// To complete the discovery process we need to pass the same callback url that we sent to the Identity Provider.
// This ensures that we can check the signature the Identity Provider sent us.
$server_url = 'http://myserver:80';
$response = $consumer->complete($server_url.'/callback.php');

We now have the response from the Identity Provider. With the response we can now check the status of the OpenId discovery and get the returned OpenId.

$openid = '';

switch($response->status)
{
case 'success':
	$openid = $response->endpoint->claimed_id;
	break;
case 'cancel':
case 'failure':
case 'setup_needed':
default:
	// If we got a status we don't understand we should take an appropriate action.
	break;
}

if (!empty($openid))
{
	// We have an openid, we need to check if it's associated with a login in our system.
}
else
{
	// We didn't get an openid, generate an error
}

What you do with the OpenID once you have it is dependent on how you handle logins on your site. You could check to see if the OpenID is already associated with an account on your site and if not then prompt the end-user to create a new account. In our case we only want to associate OpenID’s with existing end-user accounts. We store the OpenID in our database with an association to an existing account. If the OpenID is not associated then we prompt the user to enter their existing credentials (loging them in) so we can associate their OpenID with their account.

Round Up

Implementing Single Sign-On allows your users to access their data using one set of credentials, rather than having to remember a different set of credentials for each site. This means they can remember fewer more secure passwords rather than have to remember more less secure passwords (or use the same password across many sites).

For SaaS applications that target businesses implementing SSO with Google makes it possible for you to get your application listed in the Google Apps Marketplace. This gives your application access to over 2 million businesses that use Google’s enterprise applications.

Single Sign-On with OpenID and Google Part 1

What is this “Single Sign-On”

Wouldn’t it be nice if we didn’t have to remember so many passwords? With the proliferation of Software as a Service (SaaS) applications on the internet there are more and more things that we need to remember a username and password to access. This is where Single Sign-On (SSO) comes in. Single Sign-On is the idea that you can sign-on with one set of credentials and be signed-on to multiple services all at once.

Right now the most popular SSO implementation is OpenID. OpenID is a standard that defines how SSO id’s look and how you find who can verify an SSO id. OpenID’s typically look like a URL:

http://example.com/id/1234

But other services will accept an email address as the OpenID:

1234@example.com

In both cases example.com is the Identity Provider.
Continue reading

Gzip Compress Your Javascript/CSS And Serve With Apache

Browsers Do Not Cache Anything That Is Delivered Over SSL!

Did you know that? I sure didn’t. I figured that things like javascript and CSS files would be cached. But recently I was using Firebug to debug some page running under SSL and I noticed nothing was being cached. It was all re-downloaded on every single page load. Our web application has a lot of javascript, it’s well over 200 KB. Even on my very fast connection this added about 1.3 seconds to every page load!! I did a little research and it turns out that by default no browser cache’s anything delivered over SSL to protect your privacy.

Turning Off SSL For Javascript and CSS Made My App 4X Faster

Firefox about:config

If you are using Firefox there is a quick test you can do to see how you’re web application would run if it were caching javascript and CSS under SSL. Put “about:config” into your address bar. This will take you to your Firefox preferences list. Find browser.cache.disk_cache_ssl and set the value to true. Now try your website out. You should notice things are a lot more snappy.

about:config - browser.cache.disk_cache_ssl

Using Firebug To Measure Your Performance Gain

Firebug

If you aren’t using Firefox with Firebug go install it right now before you waste another second of your life debugging web pages the old fashioned way.

Go to your website and hit a page that’s on SSL that has a sizable Javascript and/or css file. Open up the Firebug console and switch to the “Net” tab. Refresh your page and note the bold number on the bottom right (this is the total time to load your page). Now turn on browser.cache.disk_cache_ssl in Firefox and then refresh the page (maybe twice). You should see the total time to load your page dramatically reduced. For myself my pages went from 1.6 seconds load time to 0.4 seconds.

Firebug Net Load Times

A PHP Fix

If you need your application to be portable (moved from one directory / URL to another without breaking) then you have been writing all your URLs as relative. For example you’ve been accessing your javascript like <script src=”scripts/my_big_javascript_file.js” />. That means if your pages are running over SSL all your resources are loaded over SSL as well and not being cached = slow. Here’s my fix in PHP. It gets the absolute URL of the current page and changes it to http instead of https.

$my_http_url = "http://" .
str_replace("//","/",$_SERVER['HTTP_HOST'] .
dirname($_SERVER['REQUEST_URI']) . "/");

define("MY_HTTP_URL",$my_http_url);

Now you can write your URL like: <script src=”scripts/my_big_javascript_file.js” /> and your javascript (or CSS etc.) will be loaded over http instead of https.

Apache Rewrite And SSL

One of the snags we ran into with this approach was our apache rewrite rules. We had some rules set that forced everything to SSL (because we didn’t want our users to accidentally login over unprotected http connections). So we changed the rules to exclude our javascript and css directories like this:

RewriteEngine on
RewriteCond %{HTTPS} !=on
RewriteCond %{REQUEST_URI} !^/css(/.*)$
RewriteCond %{REQUEST_URI} !^/js(/.*)$
RewriteRule ^/(.*) https://%{HTTP_HOST}/$1 [R,L]

The Downside

Broken SSL Lock Icon

As with everything in web development there are a few gotchas. The first and worst is when you load a page that is on SSL (https) but some of the resources for that page load over non-ssl (http) you get a broken lock icon in the browser (a lock with across through it). If you’re running an ecommerce store where assuring your users that their credit card info is completely safe this may be a deal breaker for you. For us the benefits of having everything cached far out weighed this downside.

Internet Explorer Is Annoying

IE Mixed Content Security Warning
So everything is dandy if you are running in Firefox, Safari, or Opera. You simple have a lock icon with a cross through it. But when you step into the murky world of IE your beautiful cached javascript is completely ruined by an annoying warning dialog that pops up on every page. And unlike Firefox there is no way to turn off the warning.

I tried to find a way around this but according to Microsoft’s own documentation there is none.
The only fix I could figure out is to use conditional comments to load different resources for IE than all other browsers. So in my pages when I load my javascript and/or css it looks like this:

<![if !IE]>
<link href="<?php echo MY_HTTP_URL;?>css/global.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="<?php echo MY_HTTP_URL;?>js/dojo.js"></script>
<![endif]>

Further Reading Around The Web

  1. When caching is not caching
  2. Test Non-SSL Caching
  3. A JSP Taglib that compresses static resources like JavaScript or Cascading Style Sheets
  4. Browser.cache.disk_cache_ssl
  5. Browser.cache.disk_cache_ssl
  6. Browser.cache.disk_cache_ssl

Floating Point Comparisons In PHP and Javascript

I just had to make a quick post about this. Hopefully it will save someone the trouble I’ve been through!

Beware Comparing Floating Point Values Can Be Hazardous!

“The curse of the .0000000001 and .9999999999!”

Here’s the problem. Say you’re comparing two floating point numbers a (71.00) and b (71.00) to see if they are the same. The problem is if you’ve done any calculations to arrive at these numbers they might actually be stored as 71.00000000001. Now if one of them is stored that way and the other isn’t and you compare the two to see if they are equal you’ll get a FALSE as the response, even though they should be the same.

This isn’t a bug, it’s how floating point comparisons are designed to work. It makes it so people writing complex software that requires floating point arithmetic can do what they need to. Unfortunately for the rest of us that usually are just using floating point numbers to represent things like currency (dollars in my case) it really makes life difficult!

PHP Floating Point Comparison Made Easy

Here’s my php function that compares floating point numbers:

function moneycomp($a,$comp,$b,$decimals=2) {
	$res = bccomp($a,$b,$decimals); // php function for comparing floating point numbers with a specified level of precision
	switch ($comp) {
		case ">":
			return ($res==1);
		case ">=":
			return ($res==1 || $res==0);
		case "<":
			return ($res==-1);
		case "<=":
			return ($res==-1 || $res==0);
		default:
		case "==":
			return ($res==0);
	}
}

// example usage: if ($total > $payments) ...
if (moneycomp($total,'>',$payments)) ...

// example usage: if ($debt <= $income) ...
if (moneycomp($debt,'<=',$income) ...

// example usage: if ($cost == $price) ...
if (moneycomp($cost,'==',$price) ...

So that makes life really simple. You just write your comparisons wrapped in the moneycomp() function.

Javascript Floating Point Comparison Made Easy

And here’s my javascript function that compares floating point numbers:

function moneycomp(a,comp,b,decimals) {
	if (!decimals)
		decimals = 2;
	var multiplier = Math.pow(10,decimals);
	a = Math.round(a * multiplier); // multiply to do integer comparison instead of floating point
	b = Math.round(b * multiplier);
	switch (comp) {
		case ">":
			return (a > b);
		case ">=":
			return (a >= b);
		case "<":
			return (a < b);
		case "<=":
			return (a <= b);
		case "==":
			return (a == b);
	}
	return null;
}

// example usage: if (total > payments) ...
if (moneycomp(total,'>',payments)) ...

// example usage: if (debt <= income) ...
if (moneycomp(debt,'<=',income) ...

// example usage: if (cost == price) ...
if (moneycomp(cost,'==',price) ...

Resources And Further Reading

Thanks for visiting. Stay tuned, I’m about to release a cool SEO tool I’ve been working on and using =).

Scraping Links With PHP

MerchantOS, the web’s fastest growing point of sale software company, is proud to bring you this article. If you like this one, you may also like these: Facebook PHP Tutorial and Spyjax Internet History Software.

Abstract Network

In this tutorial you will learn how to build a PHP script that scrapes links from any web page.

What You’ll Learn

  1. How to use cURL to get the content from a website (URL).
  2. Call PHP DOM functions to parse the HTML so you can extract links.
  3. Use XPath to grab links from specific parts of a page.
  4. Store the scraped links in a MySQL database.
  5. Put it all together into a link scraper.
  6. What else you could use a scraper for.
  7. Legal issues associated with scraping content.

What You Will Need

  • Basic knowledge of PHP and MySQL.
  • A web server running PHP 5.
  • The cURL extension for PHP.
  • MySQL – if you want to store the links.
Backhoe Digging

Get The Page Content

cURL is a great tool for making requests to remote servers in PHP. It can imitate a browser in pretty much every way. Here’s the code to grab our target site content:

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if (!$html) {
	echo "<br />cURL error number:" .curl_errno($ch);
	echo "<br />cURL error:" . curl_error($ch);
	exit;
}

If the request is successful $html will be filled with the content of $target_url. If the call fails then we’ll see an error message about the failure.

curl_setopt($ch, CURLOPT_URL,$target_url);

This line determines what URL will be requested. For example if you wanted to scrape this site you’d have $target_url = “/makebeta/”. I won’t go into the rest of the options that are set (except for CURLOPT_USERAGENT – see below). You can read an in depth tutorial on PHP and cURL here.

Tip: Fake Your User Agent

Many websites won’t play nice with you if you come knocking with the wrong User Agent string. What’s a User Agent string? It’s part of every request to a web server that tells it what type of agent (browser, spider, etc) is requesting the content. Some websites will give you different content depending on the user agent, so you might want to experiment. You do this in cURL with a call to curl_setopt() with CURLOPT_USERAGENT as the option:

$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);

This would set cURL’s user agent to mimic Google’s. You can find a comprehensive list of user agents here: User Agents.

Common User Agents

I’ve done a bit of the leg work for you and gathered the most common user agents:

Search Engine User Agents

  • Google – Googlebot/2.1 ( http://www.googlebot.com/bot.html)
  • Google Image – Googlebot-Image/1.0 ( http://www.googlebot.com/bot.html)
  • MSN Live – msnbot-Products/1.0 (+http://search.msn.com/msnbot.htm)
  • Yahoo – Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
  • ask

Browser User Agents

  • Firefox (WindowsXP) – Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
  • IE 7 – Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)
  • IE 6 – Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
  • Safari – Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.11 (KHTML, like Gecko) Safari/3.0.2
  • Opera – Opera/9.00 (Windows NT 5.1; U; en)

Using PHP’s DOM Functions To Parse The HTML

Puzzle Workers

PHP provides with a really cool tool for working with HTML content: DOM Functions. The DOM Functions allow you to parse HTML (or XML) into an object structure (or DOM – Document Object Model). Let’s see how we do it:

$dom = new DOMDocument();
@$dom->loadHTML($html);

Wow is it really that easy? Yes! Now we have a nice DOMDocument object that we can use to access everything within the HTML in a nice clean way. I discovered this over at Russll Beattie’s post on: Using PHP TO Scrape Sites As Feeds, thanks Russell!

Tip: You may have noticed I put @ in front of loadHTML(), this suppresses some annoying warnings that the HTML parser throws on many pages that have non-standard compliant code.

XPath Makes Getting The Links You Want Easy

Now for the real magic of the DOM: XPath! XPath allows you to gather collections of DOM nodes (otherwise known as tags in HTML). Say you want to only get links that are within unordered lists. All you have to do is write a query like “/html/body//ul//li//a” and pass it to XPath->evaluate(). I’m not going to go into all the ways you can use XPath because I’m just learning myself and someone else has already made a great list of examples: XPath Examples. Here’s a code snippet that will just get every link on the page using XPath:

$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

Next we’ll iterate through all the links we’ve gathered using XPath and store them in a database. First the code to iterate through the links:

for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$url = $href->getAttribute('href');
	storeLink($url,$target_url);
}

$hrefs is an object of type DOMNodeList and item() is a function that returns a DOMNode object for the specified index. The index can be between 0 and $hrefs->length. So we’ve got a loop that retrieves each link as a DOMNode object.

$url = $href->getAttribute('href');

DOMNodes inherit the getAttribute() function from the DOMElement class. getAttribute() returns any attribute of the node (in this case an <a> tag with the href attribute). Now we’ve got our URL and we can store it in the database.

We’ll want a database table that looks something like this:

CREATE TABLE `links` (
`url` TEXT NOT NULL ,
`gathered_from` TEXT NOT NULL ,
`time_stamp` TIMESTAMP NOT NULL
);

We’ll a storeLink() function to put the links in the database. I’ll assume you know the basics of how to connect to a database (If not grab a MySQL & PHP tutorial here).

function storeLink($url,$gathered_from) {
	$query = "INSERT INTO links (url, gathered_from) VALUES ('$url', '$gathered_from')";
	mysql_query($query) or die('Error, insert query failed');
}

Your Completed Link Scraper

function storeLink($url,$gathered_from) {
	$query = "INSERT INTO links (url, gathered_from) VALUES ('$url', '$gathered_from')";
	mysql_query($query) or die('Error, insert query failed');
}

$target_url = "//";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
	echo "<br />cURL error number:" .curl_errno($ch);
	echo "<br />cURL error:" . curl_error($ch);
	exit;
}

// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$url = $href->getAttribute('href');
	storeLink($url,$target_url);
	echo "<br />Link stored: $url";
}

What Else Could I Do With This Thing?

The possibilities are limitless. For starters you might want to store a list of sites that you want scraped in a database and then set up the script so it runs on a regular basis to scrap those sites. You could then compare the link structure over time or maybe republish the links in some sort of directory. Leave a comment below and say what you’re using this script for. Here are a few other things people have done with scrapers in the past:

Law Book and Gavel

There is no easy answer to this question. Many organizations scrap content from all over the web – Google, Yahoo, Microsoft, and many others. These companies get away with it under fair use and because site owners want to be included in the search results. However, there have been copyright infringement rulings against these companies.

The real answer is that it depends who you scrape and what you do with the content. Basic copyright law gives authors an automatic copyright on everything they create. But the same law permits fair use of copyrighted material. Fair use includes: criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research. But even these uses could be considered copyright infringement in some circumstances. So be careful before you claim “fair use” as your defense!

Here’s a couple sites that have granted you the right to use their content. They do require you to attribute the content to the author or the URL you scraped it from:

  • Wikipedia – GNU Free Documentation License
  • Open Directory Project – Open Directory License
  • Creative Commons Logo
    Creative Commons – Creative Commons Attribution 3.0

    Many sites publish their content under some form of the Creative Commons license. You can search for creative commons licensed works here: Creative Commons Search. Remember that it’s your responsibility to verify the copyright rules for anything you use, even stuff found using the Creative Commons Search.

MerchantOS and PHP Point of Sale

Update 5/26/11

Sadly PHP Point of Sale is no longer open source. The owner of PHP Point of Sale has decided to take it commercial and not offer the source code for free anymore.

If you are looking for cloud based point of sale software based on open source technology with an open API for customization, then check out MerchantOS.


PHP Point of Sale is an open source project created by Chris Muench. I had the good fortune of talking to Chris about web based point of sale and open source projects. Chris has done a great job with PHP Point of Sale, especially considering he’s done it while attending college and holding down a part time job.

Continue reading