24 Feb 2010

Database Operations on a GPU

A paper titled Accelerating SQL Database Operations on a GPU with CUDA:

This paper focuses on accelerating SELECT queries and describes the considerations in an effcient GPU implementation of the SQLite command processor. Results on an NVIDIA Tesla C1060 achieve speedups of 20-70X depending on the size of the result set.

via http://www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf

Very interesting. People are trying to move away from relational database because of its complexity, surely a 20-70X performance increase will make them reconsider. One thing though, I always thought that database is more of an I/O bound system than a CPU bound. I don't know that adding more processing power with GPU could give such performance increase. Maybe one day we will see more servers equipped with multiple high-end GPU.

5 Nov 2008

Data Masking

What is data masking?

Let me give you an example. You have developed this super-cool system, which can crunch big data instantly (just example). You show this to your client, and he wants a demo, using production-like data. But sometimes, you can't just copy the data as is, because it contains some confidential data, such as name, social security number, email address, etc. You need something like that data, but not exactly that one.

To protect the confidential data, one usually modifies that data (by UPDATE-ing all sensitive data to scramble it). I had experience like this, where we used dummy data, which contained people named "ABCDE". It's difficult to debug, because everything looked the same. Another way to do this is with data masking. With data masking, you could anonymize your data, and preserving its properties (length, field type, format). See the picture below.

Its main usage is for development and test environment, where people need production-like data, but constrained with confidential issues. Another example is when you want to outsource some parts of your system, you could provide test data without revealing your customers' phone numbers. I remember this one time, I worked as developer, and our test data contained data about people in a city who're customers of the only electricity company in this country (which is basically everyone). So, I was able to see who hadn't paid their electricity bill, who had the biggest spending in electricity, and their telephone numbers. Even for the test, we used production database, and tested the system by paying the electricity bill of certain people (they got free electricity for a month). I don't think they have policy on customers' confidential data, but I think they should, and if they do have that kind of policy, they're in big trouble. I don't want some outsource developers messed up electricity in my house, by changing my payment status, and the electricity company could cut off my electricity, and then the developers come to my house and laugh.

So, back to topic, Oracle has this product option for its database, called Data Masking. You should check it out.

Link:

16 Oct 2008

Oracle SQL Developer Data Modelling Tools

It's a very nice tools, I think. Below is taken from the web page, with some highlights from me. Note that this is an Early Adopter release.

Oracle SQL Developer Data Modeling is the latest product offering to join the Oracle Database Tools suite. SQL Developer Data Modeling offers a full spectrum of data and database modeling tools and utilities, including Entity Relationship modeling, Relational (Database Design), Data Type and Multidimensional modeling, full forward and reverse engineering and code generation. It includes importing from and exporting to a variety of sources and targets, provides a variety of formatting options and validates the models through a predefined set of Design Rules.
SQL Developer Data Modeling can connect to any Oracle Database version 9.2.0.1 and later, and is platform independent. Initially available as a standalone product, with future releases available as an extension to Oracle SQL Developer. The first Early Adopter release is stand alone and file-based only


Awesome !!

Links:

 

25 Sep 2008

HP Oracle Database Machine

Hardware by HP, Software by Oracle 
  • Extreme performance
  • Unlimited scalability
  • Enterprise ready

   

Amudi Sebastian's Posterous

Hi, my name is Amudi. I make apps, websites, and (sometimes) video games in Singapore. Here, you will find some interesting stuff I found on the internet, and probably boring writings that I wrote.


You can email me: amudi@amudi.org


*The views expressed on this blog are my own and do not necessarily reflect the views of my current or past employer*