<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Krzykawski.com &#187; Scripts</title>
	<atom:link href="http://krzykawski.com/category/scripts/feed/" rel="self" type="application/rss+xml" />
	<link>http://krzykawski.com</link>
	<description>Opensource, work, projects - sharing the fun.</description>
	<lastBuildDate>Thu, 29 Dec 2011 09:46:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.1</generator>
		<item>
		<title>UPDATE on huge table without index</title>
		<link>http://krzykawski.com/2010/09/15/update-on-huge-table-without-index/</link>
		<comments>http://krzykawski.com/2010/09/15/update-on-huge-table-without-index/#comments</comments>
		<pubDate>Wed, 15 Sep 2010 13:10:45 +0000</pubDate>
		<dc:creator>robertk</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Scripts]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://krzykawski.com/?p=143</guid>
		<description><![CDATA[This is something that keeps coming back no matter where I work. It&#8217;s always something I do the same, but it takes a minute to remember how I did it last time. I guess it&#8217;s time to share something super easy. Scenario: You have a huge table with constant activity, containing terabytes of data. You [...]]]></description>
			<content:encoded><![CDATA[<p>This is something that keeps coming back no matter where I work. It&#8217;s always something I do the same, but it takes a minute to remember how I did it last time. I guess it&#8217;s time to share something super easy.</p>
<p><strong>Scenario:</strong><br />
You have a huge table with constant activity, containing terabytes of data. You need to update/delete roughly a million random rows. Random selects from the table keep some indexes you could use hot. Running a big update, locking rows/table is not an option since you have approx. 40-50 inserts/sec to the table. So you need to run smaller batches when updating table. LIMIT would be nice, but is not supported with OFFSET when using UPDATE or DELETE.</p>
<p>Updating using IN and subquery together with limit is not supported in 5.0.x (usure about 5.1.x and forward), Else this would have been a viable solution.</p>
<p>Selects with specific where clauses might take longer due to some indexes/keys being too big as well</p>
<p><strong>Solution:</strong><br />
Selecting all Unique/Primary keys to a temp table with an auto increment primary key and then update the production table using this temporary table as a reference table to which rows to update, with a BETWEEN clause on the temporary table&#8217;s auto_increment field.<br />
May also enclose everything in a transaction and delete the same &#8216;between&#8217; on the temporary reference table. Will help you keep track of what you actually did in case you need to abort this.</p>
<p><strong>Example</strong><br />
Production table:</p>
<blockquote><p>CREATE TABLE `huge_table` (<br />
`request_id` bigint(20) NOT NULL auto_increment,<br />
`customer_id` int(11) default NULL,<br />
`data_id` bigint(20) default NULL,<br />
`group_id` int(11) default NULL,<br />
`user_id` int(11) default NULL,<br />
`action_id` int(11) default NULL,<br />
`external_id` int(11) default NULL,<br />
`user_data` varchar(2048) default NULL,<br />
`entry` text,<br />
`extra_term` varchar(255) default NULL,<br />
`transaction` text,<br />
`receive_time` datetime default NULL,<br />
`from_id` varchar(32) default NULL,<br />
`url` varchar(2048) default NULL,<br />
`from_ip` varchar(2048) default NULL,<br />
`useragent` varchar(2048) default NULL,<br />
`tz` int(11) default NULL,<br />
`cvalue` varchar(255) default NULL,<br />
`rvalue` varchar(255) default NULL,<br />
`uid` varchar(40) default NULL,<br />
PRIMARY KEY  (`request_id`),<br />
KEY `k1` (`receive_time`,`action_id`,`url`,`from_id`)<br />
) ENGINE=InnoDB AUTO_INCREMENT=1047142423 DEFAULT CHARSET=utf8</p></blockquote>
<p>You need to update action_id = 100 on approx 1.5m rows which have receive_time between &#8217;2010-08-01 00:00:00&#8242; and &#8217;2010-08-31 23:59:59&#8242; where from_id equals &#8216;q7b4x5aa0303erer&#8217;.</p>
<p>temporary table:</p>
<blockquote><p>CREATE TABLE `tmp_ids` (<br />
`id` int(11) NOT NULL auto_increment,<br />
`prod_id` bigint(20) default NULL,<br />
PRIMARY KEY  (`id`)<br />
) ENGINE=MyISAM AUTO_INCREMENT=15424160 DEFAULT CHARSET=utf8</p></blockquote>
<p>Fill the temporary table with the id&#8217;s you need to update in the production table:</p>
<blockquote><p>INSERT INTO tmp_ids SELECT * FROM huge_table USE INDEX (k1) WHERE from_id=&#8217;q7b4x5aa0303erer&#8217; AND receive_time BETWEEN &#8217;2010-08-01 00:00:00&#8242; AND &#8217;2010-08-31 23:59:59&#8242;;</p></blockquote>
<p>Then just iterate this in a bash script or our favorite scripting language (increasing the between values of course):</p>
<blockquote><p>UPDATE huge_table ht JOIN tmp_ids ti ON ht.request_id=ti.prod_id SET ht.action_id=100 WHERE ti.id BETWEEN x AND x;</p></blockquote>
<p>example bash script:</p>
<blockquote><p>#!/bin/bash</p>
<p># Config params<br />
user=&#8221;username&#8221;<br />
pass=&#8221;password&#8221;<br />
host=&#8221;hostname&#8221;<br />
db=&#8221;schema_name&#8221;<br />
tmptbl=&#8221;tmp_table&#8221;<br />
livetbl=&#8221;live_table&#8221;<br />
tmpcol=&#8221;live_id&#8221;<br />
livecol=&#8221;request_id&#8221;<br />
tmpkey=&#8221;id&#8221;<br />
upcol=&#8221;from_id=1&#8243;<br />
waittime=&#8221;0.5&#8243;</p>
<p># Print usage function<br />
function usage() {<br />
	echo &#8220;$(basename $0) <rows>&#8221;<br />
	echo &#8220;rows &#8211; number of rows in one go.&#8221;<br />
	exit 0<br />
}</p>
<p># Since we need atleast one variable to continue, check so first variable is supplied<br />
[ -e $1 ] &#038;&#038; usage &#038;&#038; exit</p>
<p># get that variable<br />
rows=$1<br />
# Check which is the first key we will use in temp table<br />
s=`mysql -N -u$user -p$pass -h$host $db -e&#8221;select min(id) from $db.$tmptbl;&#8221;`<br />
# Set latter between value<br />
let b=$s+$rows;<br />
# Count how many rows we have to update so we know how long to loop<br />
count=`mysql -N -u$user -p$pass -h$host $db -e&#8221;select count(*) from $db.$tmptbl;&#8221;|head`;</p>
<p># Print starting line<br />
echo &#8220;Starting up with updates from id: $s to $b, performing $rows at one go..&#8221;</p>
<p># loop till we are done<br />
while [ 1 ];<br />
do</p>
<p># are we done? If so exit<br />
	if (($count <= 0));<br />
		then<br />
		exit<br />
	fi</p>
<p># Run actual db updates<br />
	echo -n "updating $livetbl between $s and $b.."<br />
	mysql -N -u$user -p$pass -h$host $db -e"update $livetbl td join $tmptbl i on td.$livecol=i.$tmpcol set $setval where i.$tmpkey between $s and $b"<br />
	echo -n " deleting in $tmptbl.."<br />
	mysql -N -u$user -p$pass -h$host $db -e"delete from $db.$tmptbl where $tmpkey between $s and $b"<br />
	echo -n " Done!"</p>
<p># set between rows<br />
	let s=$s+$rows<br />
	let b=$b+$rows<br />
	let count=$count-$rows</p>
<p># wait if there are more records to update<br />
	if (($count > 0));<br />
	then<br />
		echo &#8221; ..waiting.. still $count records to update&#8221;;<br />
		sleep $waittime<br />
	fi<br />
done</p>
<p># All done<br />
echo &#8220;Script done. No more records to update&#8221;</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://krzykawski.com/2010/09/15/update-on-huge-table-without-index/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>using the percent (%) character in crontab</title>
		<link>http://krzykawski.com/2010/07/15/using-the-percent-character-in-crontab/</link>
		<comments>http://krzykawski.com/2010/07/15/using-the-percent-character-in-crontab/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 18:46:23 +0000</pubDate>
		<dc:creator>robertk</dc:creator>
				<category><![CDATA[Scripts]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://krzykawski.com/?p=109</guid>
		<description><![CDATA[Long time no blogging.. I have been using cron since I started using linux, ages back. I have never needed to use a &#8220;%&#8221; character within the command line before, since I have been encapsulating everything in scripts. Yesterday I wanted to make one of the scripts here at marin software a tad more generic, [...]]]></description>
			<content:encoded><![CDATA[<p>Long time no blogging.. <img src='http://krzykawski.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I have been using cron since I started using linux, ages back. I have never needed to use a &#8220;%&#8221; character within the command line before, since I have been encapsulating everything in scripts. Yesterday I wanted to make one of the scripts here at marin software a tad more generic, and added the line:</p>
<blockquote><p><code>0 8 * * * ~/scripts/script.sh "`date +'%F %H:%M:%S' -d yesterday`"</code></p></blockquote>
<p>to crontab. Now, this did not work.</p>
<p>I did not know that the &#8220;%&#8221; character denotes a new line in crontab. Anyway, this is useful if you want to run a script with some percent characters in the input or if you want to run a mysql query containing % in your crontab line.</p>
<p>I guess you learn something new every day, even though this might be a simple thing (you may already know), I just wanted to share it <img src='http://krzykawski.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://krzykawski.com/2010/07/15/using-the-percent-character-in-crontab/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

