UPDATE on huge table without index

This is something that keeps coming back no matter where I work. It’s always something I do the same, but it takes a minute to remember how I did it last time. I guess it’s time to share something super easy.

You have a huge table with constant activity, containing terabytes of data. You need to update/delete roughly a million random rows. Random selects from the table keep some indexes you could use hot. Running a big update, locking rows/table is not an option since you have approx. 40-50 inserts/sec to the table. So you need to run smaller batches when updating table. LIMIT would be nice, but is not supported with OFFSET when using UPDATE or DELETE.

Updating using IN and subquery together with limit is not supported in 5.0.x (usure about 5.1.x and forward), Else this would have been a viable solution.

Selects with specific where clauses might take longer due to some indexes/keys being too big as well

Selecting all Unique/Primary keys to a temp table with an auto increment primary key and then update the production table using this temporary table as a reference table to which rows to update, with a BETWEEN clause on the temporary table’s auto_increment field.
May also enclose everything in a transaction and delete the same ‘between’ on the temporary reference table. Will help you keep track of what you actually did in case you need to abort this.

Production table:

CREATE TABLE `huge_table` (
`request_id` bigint(20) NOT NULL auto_increment,
`customer_id` int(11) default NULL,
`data_id` bigint(20) default NULL,
`group_id` int(11) default NULL,
`user_id` int(11) default NULL,
`action_id` int(11) default NULL,
`external_id` int(11) default NULL,
`user_data` varchar(2048) default NULL,
`entry` text,
`extra_term` varchar(255) default NULL,
`transaction` text,
`receive_time` datetime default NULL,
`from_id` varchar(32) default NULL,
`url` varchar(2048) default NULL,
`from_ip` varchar(2048) default NULL,
`useragent` varchar(2048) default NULL,
`tz` int(11) default NULL,
`cvalue` varchar(255) default NULL,
`rvalue` varchar(255) default NULL,
`uid` varchar(40) default NULL,
PRIMARY KEY (`request_id`),
KEY `k1` (`receive_time`,`action_id`,`url`,`from_id`)

You need to update action_id = 100 on approx 1.5m rows which have receive_time between ‘2010-08-01 00:00:00’ and ‘2010-08-31 23:59:59’ where from_id equals ‘q7b4x5aa0303erer’.

temporary table:

CREATE TABLE `tmp_ids` (
`id` int(11) NOT NULL auto_increment,
`prod_id` bigint(20) default NULL,

Fill the temporary table with the id’s you need to update in the production table:

INSERT INTO tmp_ids SELECT * FROM huge_table USE INDEX (k1) WHERE from_id=’q7b4x5aa0303erer’ AND receive_time BETWEEN ‘2010-08-01 00:00:00’ AND ‘2010-08-31 23:59:59’;

Then just iterate this in a bash script or our favorite scripting language (increasing the between values of course):

UPDATE huge_table ht JOIN tmp_ids ti ON ht.request_id=ti.prod_id SET ht.action_id=100 WHERE ti.id BETWEEN x AND x;

example bash script:


# Config params

# Print usage function
function usage() {
echo “$(basename $0)
echo “rows – number of rows in one go.”
exit 0

# Since we need atleast one variable to continue, check so first variable is supplied
[ -e $1 ] && usage && exit

# get that variable
# Check which is the first key we will use in temp table
s=`mysql -N -u$user -p$pass -h$host $db -e”select min(id) from $db.$tmptbl;”`
# Set latter between value
let b=$s+$rows;
# Count how many rows we have to update so we know how long to loop
count=`mysql -N -u$user -p$pass -h$host $db -e”select count(*) from $db.$tmptbl;”|head`;

# Print starting line
echo “Starting up with updates from id: $s to $b, performing $rows at one go..”

# loop till we are done
while [ 1 ];

# are we done? If so exit
if (($count <= 0)); then exit fi# Run actual db updates echo -n "updating $livetbl between $s and $b.." mysql -N -u$user -p$pass -h$host $db -e"update $livetbl td join $tmptbl i on td.$livecol=i.$tmpcol set $setval where i.$tmpkey between $s and $b" echo -n " deleting in $tmptbl.." mysql -N -u$user -p$pass -h$host $db -e"delete from $db.$tmptbl where $tmpkey between $s and $b" echo -n " Done!"# set between rows let s=$s+$rows let b=$b+$rows let count=$count-$rows# wait if there are more records to update if (($count > 0));
echo ” ..waiting.. still $count records to update”;
sleep $waittime

# All done
echo “Script done. No more records to update”

iPhone time zone issue – Resolved

So, ever since I got my iPhone 4, i have had issues where my calendar have been showing and alerting me about my appointments one hour ahead.

I could enter an appointment today at 5pm on my phone, choosing to alert 30 minutes ahead of the appointment – The result would be that the phone would alert me at 3:30pm instead of 4:30pm. Even though it showed the correct time in the calendar on the phone.

On a side note, both MobileMe and our company’s Exchange would save the appointment at 4pm. Now, this has to do with my timezone settings. I was playing around with that, but no real change.. It was always displaying one hour wrong from what I entered. Also, if i entered an appointment the right time in any of the above calendars, the iPhone calendar would show one hour in advance.Phone screenshot-tz

I booked an appointment at the genius bar at apple, and they were as stunned as me.. But playing around in the phone together we found a second time zone setting in: Settings->Mail, Contacts, Calendar called Time Zone Support. Apparently, if you want time zones to work when you travel (to have your appointments at the right time where you are located), you have to turn Time Zone Support OFF! (?). This will change the appointments to the time you are when you move between time zones.

I guess I might not be the only one with these issues.. So sharing.. 🙂

Oracle.. And so it begins..

Oracle sues Google over Android and Java

Ever since Oracle announced the acquisition of Sun Microsystems, I have been thinking somthing like this would happen. Oracle is not exactly known for contributing to the Open Source community, even if they claim they are committed to helping and enhancing it. It feels like their ultimate goal is to close source everything. I can’t understand the nature of such an organization. Isn’t shared knowledge, more knowledge? We all want to evolve, and what better way than not inventing the wheel over and over again. Soon Oracle will want you to pay a license fee for every execution of your java application (per processor) 🙂

I am dreading what they will do with MySQL when they get there.. Same licensing model as Oracle DB? 🙂

Just had to comment on this, as I think it is ridiculous and def. not adding anything to the community.

Why to not fly RyanAir.

I guess it’s time for me to be negative about something..
I am quite a frequent flyer. I usually select a star alliance flight when flying anywhere, because of my star alliance status that gives me quite a lot of benefits.
Every now and then, I try a different airline, just to compare. Recently I flew with RyanAir. Normally I would let one miss-happening go, but the outrageously bad customer service I received during my going and back trip can not be left uncommented.

RyanAir’s attitude when something has gone wrong and they are to blame is: We could not care less about you and your travels.

To make the story as short as possible, I will not include any smaller complains that I have.
I take a cab to Stansted from London pretty early in the morning to make my flight. Stansted express would only leave me with an hour or so if I would catch the first train. Keep in mind that a Taxi to stansted is pretty expensive, so any savings you think you are making flying Ryanair, you are paying the taxi driver.
Well at Stansted, the lines are enormous. I have a little less than 2 hours to catch a my flight, but after standing in line for 40 minutes, I walk up the the counter to ask what I should do as my gate closes in 50 minutes? The chap behind the counter tells me to stand in line. If I get as close as 10 minutes before gate closure, I should check in my bag at the end of the far end of the counters. Apparently there is a express check-in there. I think it sounds strange to do it that close to gate closure, so I walk to the express check-in at once. After standing in line for 10 minutes, I approach the man behind the counter in the express check-in. I tell him that my gate closes in 40 minutes, and that I am worried about my bag making it. He responds that I am too late, and that I can’t check-in my bag. Whow, wait a minute I respond. I tell him what his colleague says, and point him out (7-8 counters away), but I get the response back that rules are rules and that he can’t change them. So I tell him again, that if he told me to stand in line another 30 minutes, and come here even closer to gate closure, there is no way I could have checked in my bag, and that he needs to sort the situation out. I am not at fault here. The response is that I am at fault. I should know the rules, and I should not trust Ryanair personnel as it is my responsibility to follow the rules, and not theirs to point them out. He gives me the option to throw away my bag (!) or try to check it in at the gate and pay extra. I explain that I already paid to have my bag checked in, and you guys messed it up. Should I now pay again because you messed up? He tells me that is the only way if I want to board the plane and have my bag with me.
Well at the gate, I explain the whole thing, but they don’t care -Sir, if you want your bag and want to fly on this aircraft, you need to check it in and pay the associated fees.
There is just no way to reason with these people, and they don’t care who’s fault it is. They want your money, and they want it yesterday. Think again before flying Ryanair. They might seem cheap at first, but mistakes made by them – You pay for.

Here is another set of horror stories about ryanair: davefaqs, i-hate-ryanair, jacobsens blog, dont fly ryanair, holidays ruined, ryanair campaign, bbc’s panorama episode

using the percent (%) character in crontab

Long time no blogging.. 🙂

I have been using cron since I started using linux, ages back. I have never needed to use a “%” character within the command line before, since I have been encapsulating everything in scripts. Yesterday I wanted to make one of the scripts here at marin software a tad more generic, and added the line:

0 8 * * * ~/scripts/script.sh "`date +'%F %H:%M:%S' -d yesterday`"

to crontab. Now, this did not work.

I did not know that the “%” character denotes a new line in crontab. Anyway, this is useful if you want to run a script with some percent characters in the input or if you want to run a mysql query containing % in your crontab line.

I guess you learn something new every day, even though this might be a simple thing (you may already know), I just wanted to share it 😉

Western Digital Customer Support

Just a quick update and information on the drive failure issue in the drobo.

As you read before – After extensive testing, I realized that the drive was faulty. I created a RMA with WDC and called them since I was requested to send in my faulty drive at my expense. I got the answer – Yes, you need to pay for shipping. So I packed the drive, but just before I sent it off, I emailed Customer support at WDC.

After some days of email conversation the outcome is: The drive was DOA (Dead On Arrival) which qualifies for WDC paid shipping. However since I did not install the drive right away (took a week) and did extensive testing to make sure the drive was broken before I ship it out – They will not pay shipping. Why? Because you have 10 days to get back to them in case the drive was DOA. If you don’t, then you pay the shipping.

So, if an exchanged drive is showing any signs of failure. Don’t bother with testing it to make sure. Just contact WDC right away and have them instruct you on what you should do. If your initial report is within 10 days of receiving the drive you are safe no matter how extensive testing you will have to do after the initial contact.

Oh, by the way: I am not complaining about WDC. Their rules are their rules. Just to bad I did not know anything about this and did not get informed about this when I received the drive in question. The WDC customer support have been helpful, and I understand that the agent that has been in touch with me can’t stretch the case further. Her managers said no, and that’s it for her. It’s my own fault that I did not act sooner. This post is purely informational.

Just my five cents worth of..

Drobo update

I have received some questions about what happened with the drobo.

Apparently, one of the drives purchased at the same time as the drobo (WD20EADS) 2TB drives was faulty. It reported fine when doing S.M.A.R.T tests, so the drobo thought it all was fine and wrote to it. The drive could write data, but it was slow as back in the 1980’s. Needless to say, I sent it in to WD for a replacement.

It took two weeks to get it replaced. Once the new drive arrived, I had already bought another two WD20EADS drives and put them in the drobo. It was therefore not urgent to insert the replaced drive.

Drobo has been working fine. Some days ago I inserted the replaced drive. The drobo was blinking yellow for a while and then everything was fine (all green). I did not have any data on the drobo at that time, since I had been running tests, but I had deleted the test files. The yellow blinking (while it was syncing data) was pretty fast. I am guestimating it to approx. 1-2 minutes.

Yesterday I decided to move some data over to the drobo. It all went fine, but the drobo notified me it had a drive failure. Look at the drobo itself, and sure, there is a red LED blinking. Guess which drive?

Today I am sending back the failed drive to WD. What is really bothering me is that I received a faulty drive from them (in exchange for my other faulty drive), and I am forced to pay the return shipping for it.

Overall, I am happy with the drobo, but after reading a lot complaints online about people having problem with drobo’s losing data, I am having doubts. I will do some hardcore testing with it to make sure my data is not lost. The whole point is to be able to have a secure, easy accessible, and easy expandable storage subsystem. What I am doubting is not any of the latter but the fact that it might not be secure. I had some ideas on building my own storage subsystem using a small server running OpenSolaris with direct attached storage (DAS) and ZFS. There are some issues though. Will go more into detail later.

By the way, anyone attending MySQL UC 2010? I’ll be there!


Long time since my last post.. But I will note bore you guys with some long and extensive post.

Recently I came across Drobo, and the model I chose was the drobopro. Pretty neat little box. For those of you who does not know what it is, chek it out here: drobopro details. Basically it’s a box for 8 SATA drives that dynamically takes care of your raid volumes. You set it up like a kind of thin-provisioning, and then just add drives as you go. Drobo calls it BeyondRAID technology. With the drives currently on the market, you can theoretically get 16TB of storage, all attached through FW800, USB or iSCSI. I bought it for private use. Primary to solve my current and future storage needs in a easy way, but also to test the unit. This can be a cheap online backup storage for small to medium sized companies. It has a rack mount option, so for those that don’t need and don’t have the cash for a NexSAN SATABeast, SASBeast, SATABoy or SASBoy, this could be a option. And what an option considering the BeyondRAID technology..

So far, I have unpacked the Drobo, hooked it up to the computer of choice (Mac mini) through FW800. I have wanted to do some performance tests, but am currently stuck with very poor performance. I have 3x2TB WD green drives (WD20EADS). Gives me a total of 6TB raw capacity (4TB with data protection). I have been in contact with Drobo support when I had time left over between work and travel, and we are slowly solving the problem. It might be a hardware fault in the drobo itself, but I will get back later on this topic. So far, even though the Unit has not met my personal needs, I am excited about it, and hope that I will get it to work with good performance soon.

MySQL UC HA talk

Just wanted to publish the MySQL HA talk presentation slides as promised before. They can be found on:


Will also write a short summary on MySQL UC tomorrow.

Day three of MySQL UC 2009

Day three has been started and already half of it passed. I did not post any entries during the other days, because i wanted to have time to digest all the information that you get fed here. A lot of new ideas and a lot of new questions has been raised. I enjoyed a lot of the Keynotes that has been held, including this morning. 

For thise that has been checking this blog on ocasions have seen that i was supposed to speak today at 11.55am. Said and done, my presentation together with Anders Karlsson from MySQL/Sun went under way at that time. The session was actually pretty crowded considering it was just before lunch time.

This was my first presentation ever, so the first minutes were a bit nervous. I will be posting the presentation files here, and you are welcome to mail me if you have any questions or want to talk about ideas that you have. Maybe i can help to shed some light about issues you are facing, or if you just are hesitant with some choice. When it comes to performance, we/I have tried pretty much everything out there; hardware and software.

If you attended the session, i would like to thank you for your attention and wish you happy HAing. 😉