<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>niden.net &#187; Storage</title>
	<atom:link href="http://www.niden.net/category/storage/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.niden.net</link>
	<description>Boldly goes where no coder has gone before... and other ramblings :)</description>
	<lastBuildDate>Tue, 07 Sep 2010 04:10:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Create an inexpensive hourly remote backup [How-To]</title>
		<link>http://www.niden.net/2010/08/how-to-create-an-inexpensive-hourly-remote-backup/</link>
		<comments>http://www.niden.net/2010/08/how-to-create-an-inexpensive-hourly-remote-backup/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 03:03:45 +0000</pubDate>
		<dc:creator>Nikolaos Dimopoulos</dc:creator>
				<category><![CDATA[Backup]]></category>
		<category><![CDATA[Gentoo]]></category>
		<category><![CDATA[How-To]]></category>
		<category><![CDATA[Installation]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.niden.net/?p=1224</guid>
		<description><![CDATA[There are two kinds of people, those who backup regularly, and those that never had a hard drive fail. I love the above quote. It is so true and I believe everyone should evaluate how much their data (emails, documents, files) is worth to them and, based on that value, create a backup strategy that suits them. I know for sure that if I ever lost the pictures and videos of my family I would be devastated since those are irreplaceable.]]></description>
			<content:encoded><![CDATA[<p></p><blockquote><p>There are two kinds of people, those who backup regularly, and those that never had a hard drive fail</p></blockquote>
<p>As you can tell <a href="http://www.niden.net/2010/08/subversion-backup-how-to/">the above is my favorite quote</a>. It is so true and I believe everyone should evaluate how much their data (emails, documents, files) is worth to them and, based on that value, create a backup strategy that suits them. I know for sure that if I ever lost the pictures and videos of my family I would be devastated since those are irreplaceable.</p>
<p>So the question is how can I have an inexpensive backup solution? All my documents and emails are stored in Google, since my domain is on <a href="http://www.google.com/a/">Google Apps</a>. What happens to the live/development servers though that host all my work? I program on a daily basis and the code has to be backed up regularly so as to avoid any hard drive failures and thus result in loss of time and money.</p>
<p>So here is my solution. I have an old computer (IBM Thincentre) which I decided to beef up a bit. I bought 4Gb of RAM from eBay for less than $100 for it. Although this is was not necessary since my solution would be based on Linux (<a href="http://www.gentoo.org">Gentoo</a> in particular), I wanted to have faster compilation times for packages.</p>
<p>I bought two external drives (750Gb and 500Gb respectively) and one 750Gb internal drive. I already have a 120Gb hard drive in the computer. The two external ones are connected to the computer using USB while the internal ones are connected using SATA.</p>
<p>The external drives are formatted using NTFS while the whole computer is built using ReiserFS.</p>
<p>Here is the approach:</p>
<ul>
<li>I have installed and have a working Gentoo installation on the machine</li>
<li>I have an active Internet connection</li>
<li>I have installed LVM on the machine and set up the core system on the 120Mb drive while the 500Mb is on LVM</li>
<li>I have 300Mb active on the LVM (from the available 500Mb)</li>
<li>I have generated a public SSH key (I will need this to exchange it with the target servers</li>
<li>I have mounted the internal 500Mb drive to the <strong>/storage</strong> folder</li>
<li>I have mounted the external USB 750Mb drive to the <strong>/backup_hourly</strong> folder</li>
<li>I have mounted the external USB 500Mb drive to the <strong>/backup_daily</strong> folder</li>
</ul>
<p>Here is how my backup works:</p>
<p>Every hour a script runs. The script uses rsync to syncrhonize files and folders from a remote server locally. Those files and folders are kept in relevant server named subfolders in the <strong>/storage</strong> folder (remember this is my LVM). So for instance my subfolders will be <strong>/storage/beryllium.niden.net</strong>, <strong>/storage/nitrogen.niden.net</strong>, <strong>/storage/argon.niden.net</strong> etc.</p>
<p>Once the rsync completes, the script continues by compressing the relevant &#8216;server&#8217; folder and creates the compressed file with a date-time stamp on its name.</p>
<p>When all compressions are completed, if the time that the script has executed is midnight, the backups are moved from the <strong>/storage</strong> folder to the <strong>/backup_daily</strong> folder (which has the external USB 500Gb mounted). If it is any other time, the files are moved in the <strong>/backup_hourly</strong> folder (which has the external USB 750Gb mounted).</p>
<p>This way I ensure that I keep a lot of backups (daily and hourly ones). The backups are being recycled, so older ones get deleted. The amount of data that you need to archive as well as the storage space you have available dictate how far back you can go in your hourly and daily cycles.</p>
<p>So let&#8217;s get down to business. The script itself:</p>
<pre>#!/bin/bash
DATE=`date +%Y-%m-%d-%H-%M`
DATE2=`date +%Y-%m-%d`
DATEBACK_HOUR=`date --date='6 days ago' +%Y-%m-%d`
DATEBACK_DAY=`date --date='60 days ago' +%Y-%m-%d`
FLAGS="--archive --verbose --numeric-ids --delete --rsh='ssh'"
BACKUP_DRIVE="/storage"
DAY_USB_DRIVE="/backup_daily"
HOUR_USB_DRIVE="/backup_hourly"</pre>
<p>These are some variables that I need for the script to work. The <strong>DATE</strong> and <strong>DATE2</strong> are used to date/time stamp the backups, while the <strong>DATEBACK_</strong>* are used to clear previous backups. In this case it is set to 6 days ago (for my system). It can be set to whatever you want provided that you do not run out of space.</p>
<p>The <strong>FLAGS</strong> variable keeps the rsync command options while the <strong>BACKUP_DRIVE</strong>, <strong>DAY_USB_DRIVE</strong> and <strong>HOUR_USB_DRIVE</strong> hold the locations of the rsync folders, daily backup and hourly backup sorage areas.</p>
<p>The script works with arrays. I have 4 arrays to do the work and the 3 of them must have exactly the same elements.</p>
<pre># RSync Information
rsync_info[1]="beryllium.niden.net html rsync"
rsync_info[2]="beryllium.niden.net db rsync"
rsync_info[3]="nitrogen.niden.net html rsync"
rsync_info[4]="nitrogen.niden.net html db"
rsync_info[5]="nitrogen.niden.net html svn"
rsync_info[6]="argon.niden.net html rsync"</pre>
<p>This is the first array which holds descriptions to what needs to be done as far as source is concerned. This gets appended to the log and helps me identify what step I am in.</p>
<pre># RSync Source Folders
rsync_source[1]="beryllium.niden.net:/var/www/localhost/htdocs/"
rsync_source[2]="beryllium.niden.net:/niden_backup/db/"
rsync_source[3]="nitrogen.niden.net:/var/www/localhost/htdocs/"
rsync_source[4]="nitrogen.niden.net:/niden_backup/db"
rsync_source[5]="nitrogen.niden.net:/niden_backup/svn"
rsync_source[6]="argon.niden.net:/var/www/localhost/htdocs/"</pre>
<p>This array holds the source host and folder. Remember that I have already exchanged SSH keys with each server, therefore when the script runs there is a direct connection to the source server. If you need to keep things a bit more secure for you, then you will need to alter the contents of the rsync_source array so that it reflects the user that you log in with as well as the password.</p>
<pre># RSync Target Folders
rsync_target[1]="beryllium.niden.net/html/"
rsync_target[2]="beryllium.niden.net/db/"
rsync_target[3]="nitrogen.niden.net/html/"
rsync_target[4]="nitrogen.niden.net/db/"
rsync_target[5]="nitrogen.niden.net/svn/"
rsync_target[6]="argon.niden.net/html/"</pre>
<p>This array holds the target locations for the rsync. These folders exist in my case under the <strong>/storage</strong> subfolder.</p>
<pre># GZip target files
servers[1]="beryllium.niden.net"
servers[2]="nitrogen.niden.net"
servers[3]="argon.niden.net"</pre>
<p>This array holds the names of the folders to be archived. These are the folders directly under the <strong>/storage</strong> folder and I am also using this array for the prefix of the compressed files. The suffix of the compressed files is a date/time stamp.</p>
<p>Here is how the script evolves:</p>
<pre>echo "BACKUP START"  &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
date &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log

echo "BACKUP START"  &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
date  &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log

# Loop through the RSync process
element_count=${#rsync_info[@]}
let "element_count = $element_count + 1"
index=1
while [ "$index" -lt "$element_count" ]
do
    echo ${rsync_info[$index]} &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    rsync $FLAGS ${rsync_source[$index]} $BACKUP_DRIVE/${rsync_target[$index]} &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    let "index = $index + 1"
done</pre>
<p>The snippet above loops through the <strong>rsync_info</strong> array and prints out the information in the log file. Right after that it uses the <strong>rsync_source</strong> and <strong>rsync_target</strong> arrays (as well as the <strong>FLAGS</strong> variable) to rsync the contents of the source server with the local folder. Remember that all three arrays have to be identical in size (<strong>rsync_info</strong>, <strong>rsync_source</strong>, <strong>rsync_target</strong>).</p>
<p>The next thing to do is zip the data (I loop through the servers array)</p>
<pre># Looping to GZip data
element_count=${#servers[@]}
let "element_count = $element_count + 1"
index=1
while [ "$index" -lt "$element_count" ]
do
    echo "GZip ${servers[$index]}" &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    tar cvfz $BACKUP_DRIVE/${servers[$index]}-$DATE.tgz $BACKUP_DRIVE/${servers[$index]} &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    let "index = $index + 1"
done</pre>
<p>The compression method I use is tar/gzip. I found it to be fast with a good compression ratio. You can choose anything you like.</p>
<p>Now I need to delete old files from the drives and copy the files on those drives. I use the servers array again.</p>
<pre># Looping to copy the produced files (if applicable) to the daily drive
element_count=${#servers[@]}
let "element_count = $element_count + 1"
index=1

while [ "$index" -lt "$element_count" ]
do
    # Copy the midnight files
    echo "Removing old daily midnight files" &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    rm -f $DAY_USB_DRIVE/${servers[$index]}/${servers[$index]}-$DATEBACK_DAY*.* &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    echo "Copying daily midnight files" &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    cp -v $BACKUP_DRIVE/${servers[$index]}-$DATE2-00-*.tgz $DAY_USB_DRIVE/${servers[$index]}  &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    rm -f $BACKUP_DRIVE/${servers[$index]}-$DATE2-00-*.tgz &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log

    # Now copy the files in the hourly
    echo "Removing old hourly files" &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    rm -f $HOUR_USB_DRIVE/${servers[$index]}/${servers[$index]}-$DATEBACK_HOUR*.* &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    echo "Copying daily midnight files" &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    cp -v $BACKUP_DRIVE/${servers[$index]}-$DATE.tgz $HOUR_USB_DRIVE/${servers[$index]} &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    rm -f $HOUR_USB_DRIVE/${servers[$index]}/${servers[$index]}-$DATEBACK*.* &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log
    let "index = $index + 1"
done

echo "BACKUP END"  &gt;&gt; $BACKUP_DRIVE/logs/$DATE.log</pre>
<p>The last part of the script loops through the servers array and:</p>
<ol>
<li>Deletes the old files (recycling of space) from the daily backup drive (<strong>/storage/backup_daily</strong>) according to the <strong>DATEBACK_DAY</strong> variable. If the files are not found a warning will appear in the log.</li>
<li>Copies the daily midnight file to the daily drive (if the file does not exist it will simply echo a warning in the log &#8211; I do not worry about warnings of this kind in the log file and was too lazy to use an IF EXISTS condition)</li>
<li>Removes the daily midnight file from the <strong>/storage</strong> drive.</li>
</ol>
<p>The reason I am using copy and then remove instead of the move (<strong>mv</strong>) command is that I have found this method to be faster.</p>
<p>Finally the same thing happens with the hourly files</p>
<ol>
<li>Old files are removed (<strong>DATEBACK_HOUR</strong> variable)</li>
<li>Hourly file gets copied to the <strong>/backup_hourly</strong> drive</li>
<li>Hourly file gets deleted from the <strong>/storage</strong> drive</li>
</ol>
<p>All I need now is to add the script in my crontab and let it run every hour.</p>
<p><strong>NOTE</strong>: The first time you will run the script you will need to do it manually (not in a cron job). The reason behind it is that the first time rsync will need to download all the contents of the source servers/folders in the <strong>/storage</strong> drive so as to create an exact mirror. Once that lengthy step is done, the script can be added in the crontab. Subsequent runs of the script will download only the changed/deleted files.</p>
<p>This method can be very effective while not using a ton of bandwidth every hour. I have used this method for the best part of a year now and it has saved me a couple of times.</p>
<p>The last thing I need to present you is the backup script that I have for my databases. As you notice above the source folder of beryllium.niden.net as far as databases are concerned is <strong>beryllium.niden.net/db/</strong>. What I do is I dump and zip the databases every hour on my servers. Although this is not a very efficient way of doing things and it adds to the bandwidth consumption every hour (since the dump will create a new file every hour) I have the following script running on my database servers every hour at the 45th minute:</p>
<pre>#!/bin/bash

DBUSER=mydbuser
DBPASS='dbpassword'
DBHOST=localhost
BACKUPFOLDER="/niden_backup"
DBNAMES="`mysql --user=$DBUSER --password=$DBPASS --host=$DBHOST --batch --skip-column-names -e "show databases"| sed 's/ /%/g'`"
OPTIONS="--quote-names --opt --compress "

# Clear the backu folder
rm -fR $BACKUPFOLDER/db/*.*

for i in $DBNAMES; do
    echo Dumping Database: $i
    mysqldump --user=$DBUSER --password=$DBPASS --host=$DBHOST $OPTIONS $i &gt; $BACKUPFOLDER/db/$i.sql
    tar cvfz $BACKUPFOLDER/db/$i.tqz $BACKUPFOLDER/db/$i.sql
    rm -f $BACKUPFOLDER/db/$i.sql
done</pre>
<p>That&#8217;s it.</p>
<p>The backup script can be downloaded <a href="http://www.niden.net/wp-content/uploads/2010/01/hourly_backup.txt">here</a>.</p>
<p>Update: The metric units for the drives were GB not MB. Thanks to <a href="http://www.codeutopia.net">Jani Hartikainen</a> for pointing it out.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6325600846885391";
/* www.niden.net Blog 468x60 */
google_ad_slot = "1288968183";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h4  class="related_post_title">Related Posts</h4><ul class="related_post"><li>December 10, 2009 -- <a href="http://www.niden.net/2009/12/faster-rsync-and-emerge-in-gentoo/" title="Faster rsync and emerge in Gentoo [How-To]">Faster rsync and emerge in Gentoo [How-To]</a> (0)</li><li>November 16, 2009 -- <a href="http://www.niden.net/2009/11/gentoo-stage-1-installation/" title="Gentoo Stage 1 Installation [How-To]">Gentoo Stage 1 Installation [How-To]</a> (1)</li><li>August 1, 2010 -- <a href="http://www.niden.net/2010/08/subversion-backup-how-to/" title="Subversion Backup [How-To]">Subversion Backup [How-To]</a> (5)</li><li>January 10, 2010 -- <a href="http://www.niden.net/2010/01/how-to-create-a-ssl-certificate-linux/" title="Create a SSL Certificate (Linux) [How-To]">Create a SSL Certificate (Linux) [How-To]</a> (4)</li><li>November 24, 2009 -- <a href="http://www.niden.net/2009/11/chromium-os-part-3/" title="Chromium OS Part 3 [How-To]">Chromium OS Part 3 [How-To]</a> (1)</li></ul>]]></content:encoded>
			<wfw:commentRss>http://www.niden.net/2010/08/how-to-create-an-inexpensive-hourly-remote-backup/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Google Paid Storage [Review]</title>
		<link>http://www.niden.net/2009/11/google-paid-storage/</link>
		<comments>http://www.niden.net/2009/11/google-paid-storage/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 21:01:54 +0000</pubDate>
		<dc:creator>Nikolaos Dimopoulos</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Online Storage]]></category>
		<category><![CDATA[Picasa]]></category>
		<category><![CDATA[Review]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.niden.net/?p=74</guid>
		<description><![CDATA[A week or so ago I read a blog post in my Google Reader about Google providing now more storage for less money. To be quite frank I did not read the whole post but I did get the message. Google was offering 10GB for $20.00 and now they are offering 20GB for $5.00. This [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>A week or so ago I read a blog post in my Google Reader about Google providing now <a href="http://googleblog.blogspot.com/2009/11/twice-storage-for-quarter-of-price.html">more storage for less</a> money.</p>
<p>To be quite frank I did not read the whole post but I did get the message. Google was offering 10GB for $20.00 and now they are offering 20GB for $5.00. This extra storage is mostly for Picasa and web albums but it can be used for other products like GMail (if you ever get above the 7.5GB that you already have there).</p>
<p>Although I was really happy to see such a move, I was kinda saddened since not more than a month ago I decided to purchase the 10GB for $20.00 and I didn&#8217;t take advantage of the new rate. The reason for the extra storage is that I can store pictures and videos for my family to see. Since most of my part of the family is located in Greece and my family and I are in the US, it only makes sense to take advantage of the Internet to keep in touch. Photographs of different events that we attend are available now to them too, while we keep a good journal of events through the years.</p>
<p>Logging into my web album in Picasa I was pleasantly surprised to see that my storage is not 10GB but <strong>81GB</strong>! I could not believe my eyes and frankly I thought that Google made a mistake. I dug up the blog post and found out what had happened. It appears that by not reading the whole article, I missed the</p>
<blockquote><p><em>and people who have extra storage will be automatically upgraded.</em></p></blockquote>
<p>The funny thing is that they even counted the 1GB that Picasa comes with (for free) once you sign up for their web albums.</p>
<p>All I can say is that now I will probably store more and more media online, not only for my family abroad to watch but for backup reasons too.</p>
<p>All we need now is a GDrive &#8211; a drive extension to connect to our online storage so that we can store everything online and never worry about anything &#8211; computer crashes and all&#8230;.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6325600846885391";
/* www.niden.net Blog 468x60 */
google_ad_slot = "1288968183";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h4  class="related_post_title">Related Posts</h4><ul class="related_post"><li>June 21, 2010 -- <a href="http://www.niden.net/2010/06/android-2-2-froyo-review/" title="Android 2.2 (Froyo) [Review]">Android 2.2 (Froyo) [Review]</a> (3)</li><li>February 28, 2010 -- <a href="http://www.niden.net/2010/02/from-iphone-to-nexus-one-review/" title="From iPhone to Nexus One [Review]">From iPhone to Nexus One [Review]</a> (9)</li><li>November 16, 2009 -- <a href="http://www.niden.net/2009/11/google-apps/" title="Google Apps [Review]">Google Apps [Review]</a> (0)</li><li>August 21, 2010 -- <a href="http://www.niden.net/2010/08/how-to-create-an-inexpensive-hourly-remote-backup/" title="Create an inexpensive hourly remote backup [How-To]">Create an inexpensive hourly remote backup [How-To]</a> (3)</li><li>August 18, 2010 -- <a href="http://www.niden.net/2010/08/google-apps-and-google-accounts-merge-information-howto/" title="Google Apps and Google Accounts merge [Information][How-To]">Google Apps and Google Accounts merge [Information][How-To]</a> (1)</li></ul>]]></content:encoded>
			<wfw:commentRss>http://www.niden.net/2009/11/google-paid-storage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flexible Storage in MySQL [How-To]</title>
		<link>http://www.niden.net/2009/11/flexible-storage-in-mysql/</link>
		<comments>http://www.niden.net/2009/11/flexible-storage-in-mysql/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 18:35:45 +0000</pubDate>
		<dc:creator>Nikolaos Dimopoulos</dc:creator>
				<category><![CDATA[How-To]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.niden.net/?p=13</guid>
		<description><![CDATA[We all need data to function. Whether this is information regarding what our body craves at the moment &#8211; hence go to the local take-away and get it or cook it &#8211; or whether this is electronic data to make our tasks easier, makes no difference. Storing data in an electronic format is always a [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>We all need data to function. Whether this is information regarding what our body craves at the moment &#8211; hence go to the local take-away and get it or cook it &#8211; or whether this is electronic data to make our tasks easier, makes no difference.</p>
<p>Storing data in an electronic format is always a challenge. When faced with a new project you always try to outthink the project&#8217;s needs and ensure that you covered all the possible angles of it. Some projects though are plain vanilla since say you only need to enter the customer&#8217;s name, surname, address and phone. But what happens when you need to enter data that you don&#8217;t know their type?</p>
<p>This is where flexible storage comes into play. You can develop a database system that will store efficiently data (well within reason) without knowing what the data will be.</p>
<p>Say we need to build an application that will be given to the customer to store data about his contacts, without knowing what the fields the customer needs. Fair enough storing the name, surname, address, phone etc. of the customer are pretty much easy and expected to be features. However what about a customer that needs to store in his/her contacts the operating system the contact uses on their computer? How about storing the contact&#8217;s favorite food recipe, their car mileage, etc. Information is so diverse that you can predict up to a point what is needed but after that you just face chaos. Of course if the application we are building is intended for one customer then everything is simpler. How about more than one customers are our target audience? For sure we cannot fill the database with fields that will definitely be empty for certain customers.</p>
<p>A simple format to store information can be achieved by storing a type and a value. The first field (data_type) will be a numeric one to hold the ID of the field while the second field (data_value) will be of TEXT type for the &#8220;value&#8221;. The reason for the TEXT is because we don&#8217;t know the size of the data that will be stored there. Indexes on both fields can help with speeding up the searches. If you use MySQL 4+ you can for sure opt for the FULLTEXT indexing method than the one used in previous MySQL versions.</p>
<p>We also need a second table to hold the list of our data types (data_type field). This table will have 2 columns and will have of course an ID (AUTOINCREMENT INT) and a VARCHAR column to hold the description of the field.</p>
<p><strong><span style="font-family: 'courier new', monospace;">CREATE TABLE data_types (<br />
type_id MEDIUMINT( 8 ) UNSIGNED NOT NULL AUTO_INCREMENT,<br />
type_name VARCHAR( 50 ) NOT NULL,<br />
PRIMARY KEY ( type_id )<br />
);</span></strong></p>
<p>The table to store the data in will be as follows:</p>
<p><strong><span style="font-family: 'courier new', monospace;">CREATE TABLE data_store (<br />
cust_id MEDIUMINT( 8 ) UNSIGNED NOT NULL ,<br />
type_id MEDIUMINT( 8 ) UNSIGNED NOT NULL ,<br />
field_data TEXT NOT NULL,<br />
PRIMARY_KEY ( cust_id, type_id )<br />
);</span></strong></p>
<p>And also creating another index:</p>
<p><strong><span style="font-family: 'courier new', monospace;">ALTER TABLE data_store ADD FULLTEXT (field_data);</span></strong></p>
<p><em>(Note that the FULLTEXT support is a feature of MySQL version 4+)</em></p>
<p>So what does this table do for us. We need to store the information of Mr. John Doe, 123 Somestreet Drive, VA, USA, +1 (000) 12345678 who likes cats and has a Ford Mustang.</p>
<p>We first add the necessary fields we need to store in our data_types table. These fields for our example are as follows:</p>
<p><strong>1 &#8211; Title<br />
2 &#8211; Country<br />
3 &#8211; Favorite animal<br />
4 &#8211; Car</strong></p>
<p>The numbers in front are the IDs that I got when entering the data in the table.</p>
<p>Assuming that the customer has a unique id of 1, we are off to store the data in our table. In essence we will be adding 4 records into the data_store table for every contact we have. The cust_id field holds the unique ID for each customer so that we can match the information to a single contact as a block.</p>
<p><strong><span style="font-family: 'courier new', monospace;">INSERT INTO data_store ( cust_id, type_id, field_data)<br />
VALUES<br />
( &#8217;1&#8242;, &#8217;1&#8242;, &#8216;Mr.&#8217;),<br />
( &#8217;1&#8242;, &#8217;2&#8242;, &#8216;USA&#8217;),<br />
( &#8217;1&#8242;, &#8217;3&#8242;, &#8216;Cat&#8217;),<br />
( &#8217;1&#8242;, &#8217;4&#8242;, &#8216;Ford Mustang&#8217;);</span></strong><br />
That&#8217;s it. Now Mr. John Doe is in our database.</p>
<p>Adding a new field will be as easy as adding a new record in our data_types table. Now with a bit of clever PHP you can read the data_types table and display the data from the data store field.</p>
<p>We can use the above example to store customer data either as a whole or as a supplement. So for instance in our example we can start by storing the customer ID, first name, surname etc. as fields also in the data_store table using a specific data type. On a different angle we can just keep the core data in a separate table (storing the first name, surname, address etc.) and linking that table with the data_store one.</p>
<p>This approach although very flexible it has its disadvantages. The first one is that each record has a TEXT field to store data in. This will be a huge overkill for data types that are meant to store boolean values or integers. Another big disadvantage is the searh through the table. It is TEXT but also it is vertically structured in blocks. So if you need to search for everyone living in the USA you will need to first find the data_type representing the Country field and then match it to the field_data field of the data_store table.</p>
<p><span style="text-decoration: underline;">There is no right way or doing something in programming</span>. It all depends on the circumstances and of course to the demands of the application we are developing.</p>
<p>This is just another way of storing data.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6325600846885391";
/* www.niden.net Blog 468x60 */
google_ad_slot = "1288968183";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h4  class="related_post_title">Related Posts</h4><ul class="related_post"><li>February 2, 2010 -- <a href="http://www.niden.net/2010/02/design-patterns-factory-series-how-to/" title="Design Patterns – Factory [Series][How-To]">Design Patterns – Factory [Series][How-To]</a> (0)</li><li>January 22, 2010 -- <a href="http://www.niden.net/2010/01/design-patterns-singleton-series-how-to/" title="Design Patterns &#8211; Singleton [Series][How-To]">Design Patterns &#8211; Singleton [Series][How-To]</a> (12)</li><li>August 21, 2010 -- <a href="http://www.niden.net/2010/08/how-to-create-an-inexpensive-hourly-remote-backup/" title="Create an inexpensive hourly remote backup [How-To]">Create an inexpensive hourly remote backup [How-To]</a> (3)</li><li>January 6, 2010 -- <a href="http://www.niden.net/2010/01/design-patterns-series/" title="Design Patterns &#8211; Series [Series][How-To]">Design Patterns &#8211; Series [Series][How-To]</a> (0)</li><li>December 15, 2009 -- <a href="http://www.niden.net/2009/12/url-beautification/" title="URL Beautification [How-To]">URL Beautification [How-To]</a> (0)</li></ul>]]></content:encoded>
			<wfw:commentRss>http://www.niden.net/2009/11/flexible-storage-in-mysql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
