How to address Schema Migration for Dummies

  1. Be very clear in what your PersistentEntity (an entity that directly corresponds to a db table) is and separate it from your Business related entities that may or may not have one to one relationship with the tables. Ideally, Business Entities should be defined on a need basis, however PersistentEntities should be defined as soon as your schema / data model design is ready for the first iteration of development.
  2. Once you create entities, you will annotate these entities, and I haved shared an example of such entities. These annotations should follow JPA annotations standard and not Hibernate annotations, because that way you will avoid vendor lockin, and be following a standard. JPA is a standard developed by Java Community and has JSR (Java Specification Request) which is implemented by Hibernate, J2EE, TopLink or any other middle-tier OR implementation.
  3. Add Hibernate Synchronizer plugin to Eclipse (you can also use the schem exporter command line utility, which you will find docs on RedHat Hibernate website). I highly recommend using Eclipse plugin because it makes the process visual and easy to follow.
  4. Use Hibernate Sychronizer to Export the Schema based on the PersistentEntities, this will require you to configure your persistence.xml file, which is usually placed in the class path, inside a jar or directly under the class path  i can share some examples for demonstration.
  5. The schema exporter works by pointing to your data base instance and generating all the tables that you have defined as PersistentEntities, taking care of all the constraint definitions, indexes etc. Will share a sample of that too.
  6. What you now have is the v1 of your db. This is ready to be used in your application by defining DAO or Data Access Layer, which abstracts out the operations performed on PersistentEntities. Note: As long as you have a good abstraction around DAO, your persistent entities will mostly remain unchanged until you decide to add more columns or tables to your schema.
  7. Now for v2, if you are adding more tables and columns, you will start with modifying the PersistentEntities directly and adding news ones where required. Reminder: All the entity relationships are also defined using Annotations, as you already are aware, and hence adding new tables only requires some new relationship annotations to be added to exisinting entities if they are going to have a new relationship with the newly added entity.
  8. After you are happy with your v2 db schema design and PersistentEntities created thereof, you will point to your Development DB server and generated the new schema based db instance.
  9. In this step you will need to have a tool that can compare db instance v1 with db instance v2 and generate a migration script for the DDL part. For data that needs to be migrated, you will have to write your own migration scripts. Note: If you are using MySQL then you are lucky because MySQL Workbench has great tools for schema comparison and migration DDL auto generation.


package com.rishik.hibernate.entity;


import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.JoinColumn;
import javax.persistence.Lob;
import javax.persistence.ManyToOne;
import javax.persistence.Table;

import org.hibernate.annotations.Generated;
import org.hibernate.annotations.GenerationTime;

* This is an object that contains data related to the file table.
* table="file"

public class FileEntity implements Serializable {

public static String REF = "FileEntity";
public static String PROP_TYPE = "type";
public static String PROP_ID = "id";
public static String PROP_USER = "user";
public static String PROP_CONTENT = "content";
public static String PROP_WORKSPACE = "workspace";
public static String PROP_CREATED_DATE = "createdDate";
public static String PROP_LAST_EDITED_DATE = "lastModifiedDate";
public static String PROP_LAST_EDITED_BY = "lastModifiedBy";

// constructors
public FileEntity () {

* Constructor for primary key
public FileEntity (java.lang.Long id) {

* Constructor for required fields
public FileEntity (
java.lang.Long id,
java.lang.Long userId,
java.lang.Long workspaceId) {


protected void initialize () {}

private int hashCode = Integer.MIN_VALUE;

// primary key
private java.lang.Long id = 0L;

// fields
private UserEntity user;
private WorkspaceEntity workspace;
private java.lang.String content;
private java.lang.String type;
private java.util.Date createdDate;
private java.util.Date lastModifiedDate;
private UserEntity lastModifiedBy;

* Return the unique identifier of this class
* generator-class="native"
* column="file_id"
public java.lang.Long getId () {
return id;

* Set the unique identifier of this class
* @param id the new ID
public void setId (java.lang.Long id) { = id;
this.hashCode = Integer.MIN_VALUE;

* Return the value associated with the column: user_id
public UserEntity getUser () {
return user;

* Set the value related to the column: user_id
* @param user the user_id value
public void setUser (UserEntity user) {
this.user = user;

* @return
public WorkspaceEntity getWorkspace() {
return workspace;

* @param workspace
public void setWorkspace(WorkspaceEntity workspace) {
this.workspace = workspace;

* Return the value associated with the column: content
public java.lang.String getContent () {
return content;

* Set the value related to the column: content
* @param content the content value
public void setContent (java.lang.String content) {
this.content = content;

* Return the value associated with the column: type
public java.lang.String getType () {
return type;

* Set the value related to the column: type
* @param type the type value
public void setType (java.lang.String type) {
this.type = type;

* @return the createdDate
public java.util.Date getCreatedDate() {
return createdDate;

* @param createdDate the createdDate to set
public void setCreatedDate(java.util.Date createdDate) {
this.createdDate = createdDate;

* @return the lastModifiedDate
public java.util.Date getLastModifiedDate() {
return lastModifiedDate;

* @param lastModifiedDate the lastModifiedDate to set
public void setLastModifiedDate(java.util.Date lastModifiedDate) {
this.lastModifiedDate = lastModifiedDate;

* @return the lastModifiedBy
public UserEntity getLastModifiedBy() {
return lastModifiedBy;

* @param lastModifiedBy the lastModifiedBy to set
public void setLastModifiedBy(UserEntity lastModifiedBy) {
this.lastModifiedBy = lastModifiedBy;

public boolean equals (Object obj) {
if (null == obj) return false;
if (!(obj instanceof FileEntity)) return false;
else {
FileEntity pOFile = (FileEntity) obj;
if (null == this.getId() || null == pOFile.getId()) return false;
else return (this.getId().equals(pOFile.getId()));

public int hashCode () {
if (Integer.MIN_VALUE == this.hashCode) {
if (null == this.getId()) return super.hashCode();
else {
String hashStr = this.getClass().getName() + ":" + this.getId().hashCode();
this.hashCode = hashStr.hashCode();
return this.hashCode;

public String toString () {
return super.toString();


Software Design – What comes first: Functionality or Usability?

Designers and developers have very different views on what should a product look like, how it should behave and even what its sphere of influence should be.

I started building my design skills around designing classes and their relationships, service interfaces and software architecture. Gradually, I moved to designing UIs since an increasing number of Web Applications were turning RIA, both in Enterprise and Consumer world; the industry was moving in that direction.  I focused on building applications with UI that would be easy to use while maintaining the necessary complexity of business requirements. Like that of any other developer, my instincts were naturally tuned to designing the applications for better code reuse, appropriate use of design patterns,  generic functionality and extensibility. Everything but usability. Almost every software engineer that I know has been conventionally  focused on functionality and the other things I mentioned above.

But hey, what is more important – Functionality or Usability?

This question, to say the least, is unfair. Usually, this question is asked by Software Engineers the most. Why? Because for them prioritizing their tasks is the biggest challenge. Most often developers enter the scene too late in a product management life cycle, which is the development time, of course starting from Low Level Design. Most of the high level design or problem solving is limited to people who are either too far removed from the ground realities of implementation or have too little bandwidth to delve in such issues. Their main focus is making the customer happy and they seem to do their jobs pretty well, except they end up making the engineer’s job hell. An engineer is required to make all the tactical decisions with free will, yet under the constraint that the product they are building must remain highly usable as perceived by the business analyst or technical product manager, or customer.

Why do Engineers put Functionality first?

Engineers like rational thinking and naturally go for the optimal path in solving problems, and they like to solve the whole problem at once. They don’t understand the concept of a “Dumb User”, because they see the world being made of intelligent beings. They like to build products that are feature rich and can have various layers of complexity like human mind. Yet, they fail to realize that an ordinary user rarely ever thinks of a complex product as an intellectually rewarding puzzle, for her every software product she uses serves as a means to an end. Engineers like to throw in a little extra on the functionality hoping the user will some day, lost in their tracks as they usually are, hit upon a remote corner of the application and be pleased to find something useful to do there and be awed by how deep the engineers mind had penetrated. Whereas, users mostly like to stay in the well acquainted territory of known functionality which is usually only a 20% of the total functionality of a product so they would rather have that 20% delivered to them rock solid robustness and tremendous ease of use.

When to bring in Usability?

In principle, as soon as you conceive the product. In reality, every development milestone should be trigger for a Usability Test like it is for a Sanity Test. Usability is not just about decoration, it is about serving a pleasurable experience that user may want to return to, over and over again. Now, even though it sounds logical, Usability is rarely a part of development or design discussion amongst engineers. The only people you see ever discussing about usability are Designers, QA, Product Managers, Sales and Customer Support. Which means, except developers, everybody else thinks about it. And you might wonder why so, and how it might be resolved?

Engineers need to Understand “Why?” before they start on “How?”

If a developer, while writing code, finds a situation that doesn’t make sense, it probably really doesn’t make sense, yet the developer ever so engrossed in getting things done, churning large volumes of code every minute, doesn’t want to examine it. That is where the usability is compromised the most. If developers understand what business purpose the feature is serving, they can never build an unusable product.

Is that it?

Yes and No! For basic usability scenarios, that related to low level controls and basic interaction between the controls, developer paying more attention can solve the problem.

In Next Post I will discuss how usability can move higher up on an Engineers priority list.

A/B Testing – Knowing what the User Wants

A/B Testing?

Have you ever wondered how the Social Games like Farmville and MafiaWars become so viral and engaging? How Google, Amazon, eBay and Zynga are able to present to you the content at the right time and with the right keywords to grab your attention?

There is a User Testing process known as A/B Testing that is employed by many giant IT companies that run their business online. It started as an email-marketing trick, to present variations (single-variable at a time) of the same content to different cohorts of users and collect the response and the change in response based on those variations.

For instance, if 10 users were sent emails about a flower shop in downtown San Jose, 5 of the users might see the picture of the flower shop at the top and 5 users will see the same picture at the bottom. Based on which group of users chose to read the mail and follow the links more would then be collected, using the parametrized urls embedded in the email, to determine which approach was working better than the other. And that information would then be used to create email templates that would have higher chances of being followed by the user. This stochastic analysis allows the data analysts to create models based on different variants of the component that are being tested and use these models to incorporate attractive features to the marketing tools and eventually the products that are being marketed. Here is  demo to demonstrate what is actually tested,  ABTestDemo.

The Science

To make it work, users shouldn’t really know that they are being tested on, as knowing that may affect their natural reaction. Thus, only one of the two options are presented to a user at a time and user’s reaction is captured to measure effectiveness of that option for that particular user; a collection of responses for a larger user base allows analysts to determine how that cohort generally feels about the feature. ABTests are very similar to psychometric tests except the goal is not to understand a particular user better but to establish behavioral trends in a large group of users.

Having said that, it is still a huge task to identify what should be and can be tested, how the variations should be parametrized, which variations should be used for which cohort and how to interpret the results collected for each variation used in the tests. Since both the scale and the magnitude of ABTesting can easily become intractable, ABTesting and the conclusions drawn from the results are basically part of a stochastic process. What that means is the conclusions are not definite but probabilistic. Or in other words, what a test like the one shown above in ABTestDemo can tell us deterministically is 8 out of 10 users like Option B more than Option A, however it may not be able to tell us why 80% users like one option over the other since the users that are making the choices do not consciously make a choice rather they are subconsciously being driven towards or away from the conversion goal (where conversion may mean different things to different applications or websites.) Therefore, the improvement from ABTesting is dependent on the testing team coming up with really compelling options for the users. If ‘Option A’ and ‘Option B’  of a test end up evoking the same or similar emotions, the test can not really yield any worthwhile results. So, choosing the right options for the tests and choosing the right tests becomes critical to the success of ABTesting.

The Technology

I came across ABTesting while working as the SDK architect at Playfirst Inc. I was developing the SDK and was assigned the additional responsibility of including ABTesting support in the games, by means of SDK. What I had was a hashing algorithm for creating the basic percentage based buckets to allocate the test variants to. The architecture was very simple, there was a service written in PHP that would be invoked at the time of deployment to load the A/B variations and rules of distribution for the percentage of users getting allocated to one or the other bucket for each test. These rules once created per persisted in the Memcache, so that when each user logged in to the game, their hash would be calculated using their unique id and some other input parameters. The calculated hash would then be used to place user under one of the two A/B buckets based on which percentage distribution range they fell in. This allocation map for all tests the user would undergo was then stored against user’s id in the Memcache, so on subsequent login, the valid tests would not be recalculated. However, the cached maps were invalidated every time the tests, the distribution or the rules changed on the server side. Once calculated the bucket in which user was placed would be used as an additional parameter for all the subsequent requests from client to server, so that the server will send metadata that was applicable for the user’s allocated bucket option. The client would also have the intelligence to present features (being tested) differently to the users and measure their responses.


I am a big fan of this method of testing and collecting data because this approach favors an unbiased and natural process of selection of a stronger candidate. I think this is the most natural way of evolution and is therefore more organic, since no abrupt undesirable or hard-to-deal-with changes are introduced to the users. Quickly changing products cause more nervousness than confidence in users. Even thought it can be a slightly slow and cumbersome process, it promises a much deeper insight into user behavior and how the product is perceived generally. This method is also very tricky to implement because identifying the areas to be tested and tests to be defined takes enormously wide perspective and can become an issue of disconcert between the team members designing and implementing it. Also, I haven’t come across any frameworks or cookie-cutter templates for implementing ABTesting I guess the reason is that each product has different needs and is implemented differently, so it will take a while before patterns are identified and frameworks are created.


  1. Wikipedia –

Writing Software with AI – Adaptive Programming

It is natural that even humans don’t learn everything over night and that we have to spend a lot of time probing information so that we can acquire knowledge. This knowledge then becomes intelligence once we have processed it. Therefore in my opinion it might be worth it to build an AI system that asks questions, launches quests and goes after missing pieces of information, whenever it is stuck on a new problem.

Adaptive programs

Adaptive programs will be sort of software systems that while working on a complex problem will have the ability to simultaneously discover peers that might already know solution to part of the problem or be able to give directions to contact another software system that might. To prevent the infinite wait times after second level of delegation the software system will stop active probing and assign the responsibility to an asynchronous probing agent to probe further.

This network based software system collaboration can reduce the complexity of the software systems and provide for most optimal reuse of existing solutions to recurring problems. The central principle is that software systems will not just be used for computation, but for gathering knowledge on the problem sets they can solve directly or by delegation to peer software systems. The communication can be based on open protocols and human feedback can be used to refine solutions arrived at by these adaptive programs.

Deep Linking:Restore State of Flex Application

Dear Dummy Friends,

I know I have been keeping quite tight lipped for a long time and the reason for that is either I didn’t have anything to say or whatever I wanted to say was not coming out pretty well. However, when I discovered this feature I really hit myself on the head for not having read the documentation thoroughly before and felt an urgent need of sharing it with you guys.

Have you ever wondered why flex application couldn’t be started from the last state that user closed the browser in?

Prepare to be liberated. Flex supports a feature called deep linking.

What is deep linking?

Deep linking is a method of making the flash application aware of the browser url and vice versa.

The first thing that we learn while writing flex application is that there is no need to navigate over pages in browser while using a flashplayer based applications. So, it is often miconstrued that flashplayer or the flex framework shouldn’t or doesn’t care about to providing any bridge between the state of the application and the state of the browser.

It is quite natural to believe that and treat it to be a constraint while developing flex applications. However, the truth is that flex framework does care about this comaptibility between the browser navigation and in-flex navigation. And you can see it by looking at the elaborate API the flex framework offers. [Article: Deep Linking in Flex,LiveDocs: About Deep Linking]


This feature will add more of a backward compatibility for the old-school browser dependent generation that still thinks it is cool to navigate using only the back-forward buttons. I might add that I belong to that school of thought too.

A Cool Use Case

Apart from navigation it allows you to bookmark certain screens in a flex application. So that might actually be a cool feature to be used in application that provide iPaper like functionality. Say you were reading a book online on Scribd or any equivalent site. And suddenly you have to leave but you don’t want to leave your computer on and would rather have it turned off. Yet you sill want to be able to come back to the same page you were reading on the eBook. The flex based eBooks can support bookmarking to give it a real life feel.


Restoring the state only applies to the visible part of the application not to the other hidden portions of the application. Which means there should be some extra effort put in to ensure that the browser dependent navigation can be accurately captured and that it triggers all the state restoration in the same manner as if the application was going through its normal flow.

Before you leave!

There are various pitfalls in using this feature, so I would recommend strongly that you read the docs thoroughly and weigh the advantages over disadvantages before you support browser navigation.

Point to be noted

Good application only differs from average application in terms of setting the right expectations and meeting them consistently.

Happy Coding!


Flex Tip: Stop CSS from getting heavy!

Remember friends, if you are using flex builder 3 and trying out various fonts for your widgets, everytime you select a font, it gets embedded into your CSS and that makes your SWF heavier. You would have noticed that with time it takes longer and longer to compile.

To get rid of this problem, ensure that you uncheck the Embedded checkbox right next to your font selection drop down. Everytime you see it checked, just uncheck it if you don’t want the font to be embedded.

Usually there are standard fonts that do not require embedding and usually they suffice for most of the business UIs.

So, if you don’t know what you are doing to make your flex design view become slower in applying style changes, you take a look at the fonts embedded to the CSS, accidentally.

Have fun with Flex and keep it simple!

Flex Video Stream Filtering – B&W, Brightness, Contrast and More…

Dear Dummy Friends,

I am back with some more trivia, just in case you have been struggling with attaching a video stream to your application, capturing images, changing the color (filtering), or manipulating the results of a video stream capture, here are some tips for you.

First things that you need to know in order to begin the video or image filtering are:

  • An image or a video can be thought of as a layer of colors (red, green, blue and opacity).
  • The layers can be transformed using Matrix Transformation (ordinary mathematics)
  • Transparency, Brightness, Contrast and Other properties if an image or a video can be handled by applying these transformations. (This is known as filtering.)
(Courtesy: Tutorial)

It is highly recommended that you visit the following links to know better about the concept of image processing:

I am going to talk more about applying these transformations or filtering on Video and Images in Flex 3.3 programming.

Keep tuned and get yourself ready with the understanding of color matrix.

Project KENAI and Drishta

I have just returned from the Second day of Sun Tech Days 2009, Hyderabad. In the last few hours the first thing I decided to do was to look up Project Kenai, a platform for collaborative, open source product development with a host of Social Networking features and Web 2.0 style website that looks cool and feels cool too (and I am not referring to the color blue that is the theme color for the website). There are various things that might not appear quite user friendly to a Useability expert but for a developer who is trying to build a product in collaboration with the whole community of java developers the features are just about right to get started.

I have registered on the website and have been granted the permission to host my own project called Drishta. I am going to talk more about my project and the idea behind it in the following posts, but for now what I am very excited about is that I have finally decided to reignite this initiative that has been lying dormant in my TODO list for nearly a decade.

I have liked the idea of Kenai so far. It is simple. Straightforward. And easy to use. I would recommend giving it a fair try and making your own impression of this useful platform.

Keep tuned for Drishta!!!

Resource Update: GoF Design Patterns – Only notations, no text!

Sometimes, a picture is indeed worth a thousand words. Especially I always felt that the Design Patterns book was quite verbose and distracting in its approach. It would have been great if the examples had been a little more simplified and did not clutter the space. Also, a bit of comparative study of these patterns could allow one to appreciate the subtle nuances and the differences between these patterns, which look quite alike to a newbie.

Finally, I struck gold when I found this link. Some old site that contains the pictorial or PepperSeed images showcasing the Design Patterns. The notations are quite similar to UML notations. And There is astoundingly no text, whatsoever!

I find this a very essential 5 minute referesher that the Architects should keep handy for reference, whenever a moment of uncertainty, while designing a complex application, renders them actionless.

This is the link to the site. Gang of Four Design Patterns

Here is a sample of Adapter Pattern and Bridge Pattern, when placed side by side once can see the subtle difference so clearly, and without any textual clutter around it.

Adapter Pattern

Bridge Pattern

I hope you appreciate the essence of text less GoF Design Patterns!

P.S. The images were stolen from Gang of Four Design Patterns.

Bulky Data Transfer with Single Hit vs Light Data Transfer with Multiple Hits

This is a classic dilemma. What is a better model for browser based clients and back-end services to interact? Should there be as little dialogue as possible with maximum data transfer or as much dialogue as possible with minimum data transfer in each exchange of request-response. The problem is not as simple to solve as the problem statement makes it sound.

There are various parameters that need to be considered for answering this question. And the solution varies as per the requirements.

Let’s start with considering a revolutionary example: Gmail!

As we are aware, Google employed the AJAX philosophy and created the first ever AJAX based email application that changed the paradigm of web mails, by increasing the performance of reading, composing and sending emails over the internet. The browsers were same as before, the bandwidth was the same but the application architecture had changed drastically, making more possible with the same set of resources.

However, it was not just about using a new technology but identifying the appropriate problem that this technology could solve. In an email type web application, a user generally performs units of tasks with every click.

  • Open a mail for reading,
  • Compose a mail/reply,
  • Add / Remove the attachments and
  • Send the mail.

It never happens that a user is reading a mail that is being modified by the sender simultaneously. So, there is no real time change in an email’s content.

If at all there is a modification required to the content of an email that was already sent/delivered, one needs to re-send the mail with the modifications. In this scenario, fetching each mail’s content as and when the header of the mail is clicked makes sense, and doing it asynchronously (i.e. without refreshing the whole page) makes the usability far more intuitive and elegant.

On the other hand, is an application that shows data that is highly likely to be modified at the same time when a user is viewing the information, such a model of conversation may not really work.

Say, you have an application that displays data in the grid where rows and columns are collapsible and contain levels of information (something like a hierarchical data set). See the example.

The grid’s cells don’t enjoy absolute independence from the other rows and columns of data surrounding it. In other words, each cell of information is not just an atomic information in itself, but also forms a part of the information that is at a level higher than it. For instance, Physical Supply of 100 could mean there is a PO of 50, POK of 30 and TO of 20, which are all displayed as the child levels of Physical Supply. Now, in case there was a new PO of 20 created by some user, while the data mentioned above is on display in the SCP, by showing an additional 20 for PO i.e. PO of 70 would not be enough to represent the actual state of Supply Chain because, the sum of quantities for different Documents (70+30+20 = 120) would be incosistent with the total being shown for the Physical Supply (50+30+20 = 100).

With this problem at hand the requirement would be to update all the cells in the row and column that contain the modified cell, as part of calculating the summary for that row and column respectively. This requirement in turn requires us to keep a track of all the cells in the grid because the probability of any cell getting modified at any given time is almost equal, therefore any row and column could be required to be refreshed with the latest data.

Essentially, each cell update leads to an update of almost all the cells in that section (sections here refer to the parent rows viz. Demand, Suppply, Recommendations etc.) We could therefore limit the updates of a cell leading to the update of just the section in which it lies. This in turn would lead to a different kind of inconsistency. For instnace, a Demand of 30 and a Supply or 20 leads to a Recommendation of 10 and a Projected Inventory of -10. This means that even the sections within the SCP grid are interdependent therefore any cell that has to show updated data, would require all the other cells in the grid to update the data, directly or indirectly.

So, it is pretty clear that the data in the entire grid has to be treated as a whole in order to view a consistent information in the Supply Chain Profile at any given time. The rendering of data, however, doesn’t need to happen in a single shot because like viewing a section or a particular row or column or a cell in particular, is quite similar to checking one’s inbox for a specific email. And once that email is located, all one cares about is reading the information within. Similarly, a user will only follow a single path for drilling down information about the SCP at any given time. This in turn leads to a simple requirement of rendering data (cells, rows, columns) only for the particular paths.

The conclusion or the solution for our problem is to fetch the entire data in a Single Hit to the server, and then use selective and efficient rendering of the data based on where user clicks on the UI. We can and have tried to make the cell specific information (which is shown in the pop-ups) asynchronous, but that leads to holding the state of the data object that was transported to the client earlier. This leads to additional memory and state management requirements on the back end. Therefore, we chose to send the information in a bulk to the front end. This might be slightly slower but it represents the state of SCP accurately and consistently always and everytime.