Archive for the ‘Database’ Category
Git- The Distributed VCS That Makes A Boom!
Friday, October 7th, 2011Version controlling is one of the main tasks in developing an application, no matter how big the application is. The developer will enjoy developing applications when file version control system (VCS) is used. The software life cycle consists of many phases. The major phases are “Integration” and “Maintenance”. VCS is a solution for many problems that a developer faces during the development and maintenance phases of the software life cycle. VCS helps a developer to complete these phases without much struggle, by keeping a log of all changes made by the developer
If you develop and deploy medium to large scale applications, you would have used version control system. The most widely used version control system is Apache Subversion. Apache Subversion or SVN in short, is a Centralized Version Control System. In this system, there is a single central repository. All changes made to documents/codes are saved in this repository. There can be many users (or developers) but all the changes are updated to this central database instantly. (more…)
Proof Your Site Against SQL Injection
Friday, October 23rd, 2009Web applications often accept input data from users and pass it to database systems. Problems happen when such data contain characters that have special meaning to the database. For example, the single quote (‘) is used by most database systems to terminate a string. Different databases have different such meta-characters. Hackers skilfully exploit the use of meta-characters to run SQL commands in the database.
Consider this code fragment that accepts an employee id from a web form and fetches employee details from the Employee table:
$empId = _GET["employee_id"];
$query = ”SELECT * FROM Employee WHERE emp_id = ” . $empld
If a user fills in the employee_id field with
1 ; UPDATE Employee SET salary = 5000
that would result in the following queries to execute:
SELECT * FROM Employee WHERE emp_id = 1 ; UPDATE Employee SET salary = 5000
Many database systems like Postgre allow multiple queries to execute in a single command. It’s anybody’s guess as to what would happen if the above queries execute. The hacker is exploiting the semi-colon (;) meta-character, which is used as a statement separator in many database systems.
The above is only one example; there are in fact hundreds of ways in which other meta-characters can be exploited to execute a variety of harmful commands.
So how do you protect your application against SQL injection? Some steps are outlined below.
- Escape the special characters before you pass it on to the database. You need to figure out all the meta-characters used by your database and escape them, so they are interpreted as regular data and not as control characters.
- Avoid dynamic SQL generation in your code. Use prepared statements instead. When prepared statements are used, the database will not confuse data for control characters.
- Stored Procedures offer some degree of protection, but are not injection-proof either. They are vulnerable if additional parsing is performed on input parameters.
- The application should have only minimal rights to perform operations in the database. It need not have DROP access if your application doesn’t have to drop a table.
- Validate your input. Do this on the server-side as client-side validation cannot be trusted. Write per-field validations using regular expressions that allow only whitelisted characters. Limiting the allowable range of characters and length of each field will reduce the chance of injection.
- I have mentioned escaping data before passing it on to the database. However, many developers perform input escaping. This kind of blanket escaping is generally effective. However, it could create problems if you need to pass data to other subsystems (example: sending an email), which may have different escaping needs. The ideal solution is to perform output escaping before passing data to another subsystem.
- Catch database exceptions and show a custom error page to the user. Show minimal information about the error to the user. Database exceptions carry more information than what the user needs to see – it may contain table and column names. So catch these exceptions in your application and redirect the user to an error page that does not give out much information.
This article gave some tips on how to avoid falling prey to SQL injection attacks. Only the application developer can prevent this type of attack. It cannot be avoided by installing firewalls or SSL certificates. Developers need to think about how to prevent injection attacks when they write code and not as an after thought.
Pitfalls in WordPress Version 2.6.1
Friday, August 21st, 2009Almost a year back (Aug 15th, ’08, to be precise), AUTOMATTIC released WordPress 2.6.1 fixing over 60 bugs. Also the version featured with the introduction of ‘right to left’ typing for Hebrew and Farsi language administrators. In a very short period of time (may be around one month), the company alerted 2.6.1 version users of security holes in using the same. Here, in this small article, we are going to analyze those vulnerabilities that made AUTOMATTIC release an upgrade for WordPress version 2.6.1 so soon.
Ok, let’s be clear and to the point. The problem is created by the nature of:
1. mt_rand () function of PHP and
2. the truncation method that MySQL adopts
mt_rand ():
PHP has two random number generating functions: rand (), mt_rand (). The former uses GNU C library and the latter uses Mersenne Twister algorithm. Mersenne Twister algorithm was created by Takuji Nishimura and Makoto Matsumoto of Japan. mt_rand () is predominantly used in most of the PHP applications and most importantly, WordPress 2.6.1 uses it.
Normally a seed is used to initiate the generation of random numbers. If it is possible to determine that seed, we will be able to generate the same sequence for any number of times. In other words, we will be able to hack the working of random generation. Seed can be determined using a lookup. Now, once the seed is found, anyone can generate the sequence that the application generates. If you want to know how this is possible, you got to learn random number generation in PHP or there’s an alternative: bow to the fact that it is the nature of mt_rand () function.
Now, make a request for admin password which would send an activation link to the actual admin. But since we have the seed, we will be able to calculate the same activation link by enabling Keep Alive HTTP request. Activating this link and using a different email ID in the form will allow creation of a new WordPress admin password and thereby complete control.
MySQL Truncation:
Let’s see the next one. When the string input given in a query is longer than the defined maximum length, MySQL, by default, truncates the string to the defined maximum length. For example, if the maximum value of the string column is defined to be 8 then, the input value, “qburst_expressions” will be truncated to “qburst_e”. There will be a warning displayed but, applications are normally not configured to handle those warnings. And importantly, WordPress version 2.6.1 was not.
Suppose I know the WordPress admin name, (let’s say, “godfrey”) and the maximum length of the username in MySQL is set as 32. When I register as a new user with the same name “godfrey”, obviously, MySQL will return an error as there already exists an username godfrey. Now, I try with “godfrey “(with 2 spaces after the name), MySQL will truncate the string to “godfrey” and again return an error due to the same reason. Suppose I try with “godfrey g” (with 25 spaces between godfrey and g) then MySQL will not be able to identify a similar username and also truncate the name to “godfrey” to be inserted into the database column. This happen because the username exceeds the defined maximum length of 32 and the system will not be able to find a match in the database. Now we have 2 admin usernames in the table. This is sufficient to pass the validation and gain access to the password of the original admin, thereby complete control.
| Username | Length | Max Length | After Truncation | Database Change |
|
“godfrey” |
7 |
32 |
“godfrey” |
No change |
|
“godfrey “ |
9 |
32 |
“godfrey” |
No change |
|
“godfrey g” |
33 |
32 |
“godfrey” |
Truncated string (godfrey) inserted as new username into DB |
These holes in security made AUTOMATTIC to work on an upgrade at the earliest. And the next release fixed all these errors. So if you are planning to use WordPress, make sure you use the latest version and remain safe. WordPress 2.8.4 is available for download now. It is the latest stable version of WordPress according to the AUTOMATTIC’s last release.
Collation in SQL
Thursday, June 11th, 2009Collation defines how data is sorted and compared in SQL. It is a set of rules that defines case-sensitivity, accent sensitivity, width-sensitivity etc. The most common use of collation is in case-sensitive search using SQL query. Let’s see how it proves to be useful with case-sensitivity.
Normally, it is impossible to perform a case-sensitive search in SQL. For example, see the following queries:
SELECT * FROM myTable WHERE myCompany=’QBurst’
SELECT * FROM myTable WHERE myCompany=’qburst’
Both queries will produce the same result when they are executed. But there are lots of cases where we want to differentiate the result based on search string case.
The necessity for this type of search has greatly increased, so Microsoft introduced a feature known as “Collation”. With collation, we can redefine the search scenario at the database level (rewriting the default collation of SQL server set during its installation). For the previous example, if we want to get results for “QBurst” only if we search with string “QBurst” and with no other case combination, we should write the query like:
SELECT * FROM myTable WHERE myCompany=’QBurst’ COLLATE SQL_Latin1_General_Cp437_CS_AS
Here the only thing you may not be familiar with will be “SQL_Latin1_General_Cp437_CS_AS “. This string is called Collation Name. Microsoft defines different combinations of collation names, with the combinations produced by using various supporting languages for accent-sensitivity and also with case-sensitive/case-insensitive options. You can have a detailed look on more collation names here:
http://msdn.microsoft.com/en-us/library/ms144250(SQL.90).aspx
http://msdn.microsoft.com/en-us/library/ms180175.aspx
The collation can also be applied in the parent levels such as when creating databases. The statement, “CREATE DATABASE myCollationDatabase COLLATE Latin1_General_Cp437_CS_AS”, makes the whole database arranged according to the collation specified.
The same can also be extended to applying different collations for different columns in a table. The code below shows an example:
CREATE TABLE myCollationTable
{
userName VARCHAR(20) COLLATE SQL_Latin1_General_Cp437_CI_AI,
userOccupation VARCHAR(30) COLLATE SQL_Latin1_General_Cp1_CI_AI,
userCompany VARCHAR(30) COLLATE SQL_Latin1_General_Cp1_CS_AS
}
Collation can be very efficient if used in a proper way. No major limitations have been reported yet for this feature. Now, you can try it for yourself.
Happy coding!
MySQL Replication
Thursday, May 7th, 2009Replication has now become an essential feature for most MySQL users. The good news I can share at the same time is that the working and implementation of this concept is also less complicated. It involves a minimum of 2 servers: a master and a slave (in most cases). The slave makes use of the binary logs created by the master to update its database thereby keeping both of them in exact synchronization.
Issues Leading to the Need of Replication
Heavy Load:
Lets consider a website with an exponentially increasing number of users regularly. There will arrive a state in which the single database server could no longer handle the load anymore. If the server receives more number of read queries rather than write queries (which will be the normal case for most of the websites) then, the best choice will be to adopt replication into the current architecture. Here, the read queries refer to SELECT statements and the write queries refer to INSERT, UPDATE and DELETE statements.
Now I am going to explain how replication solves the issue of heavy load. When the concept of replication is implemented we will be having more than one server. Among these servers, the one named as master will receive queries related to write and make changes to its database immediately. Consequently, when the binary log is updated, the slaves update their database reading from the log files. The slaves on the other hand receive all read queries. Depending on the number of queries received, the number of slaves can be increased or decreased. Now by using any scheduling algorithm (Round Robin is an example), we can effectively load balance the incoming read queries to different slave servers so that all of them get equal workloads.
Backup: Anytime and Without Client Disturbance
During backup requirements, we normally stop MySQL or lock the read queries to get an exact backup. This may sometimes result in the annoyance of the clients who access the website during the process. Although there are a few clever techniques with which you may do this without the notice of the clients, things become very simple with replication.
The slaves always remain in exact synchronization with the master. In other words, the slaves will have another copy of the entire repository that the master processes. And hence backing up of a slave is similar and as good as backing up of the master. Also the presence of slaves as exact replicas will in most cases help avoid the need for backup of the master. This is because we always have the slaves as a spare in case of any misfortunes to the master.
Distribution of Data Without Respect to Distance:
Next issue I am to focus on deals with distributing copies of data in various locations that are geographically very apart which is not a trivial task. But the replication factor gives the flexibility that we require to make it trivial.
The master provides no errors even if the slave remains disconnected for some time. So in spite of the poor connection and other factors that may influence the link between the different destinations, a synchronized copy of the master can be made to exist in a geographically distant region.
Architectures of Replication
There are a few rules that I recommend to be kept in mind to better understand the different architectures.
- There needs to be a unique server ID for every slave
- There can be many slaves for a master
- There can be only one master for a slave
- Slaves can also function as masters
Master: Slave
This architecture best suits an environment, which has low number of write queries and high number of read queries. Effective load balance can be achieved by spreading the workload among the different servers. Here is an illustration.
Dual Master:
This kind of architecture is particularly useful when servers are geographically far apart. Although during interruptions, neither will have access to their data both will catch up from each other when the connection is reestablished. An extension of this architecture will be to have a slave on either side that is also diagrammatically shown below.
.
Pyramid:
In a large organization where there is diverse distribution in a hierarchical manner, a possible architecture like this will be the best suit. There is no necessity to configure every slave with the master as the slave above in the hierarchy can act as their master.
Although replication solves problems, it demands so much precision, which if not taken care of, can even result in the crash of the master database. Slaves are not always in synchronized state with their master. But with proper monitoring systems this can be detected. The concept of replication is provided by MySQL and can definitely improve overall performance if dealt with proper caution.