Dev Bytes: Performance

Showing posts with label Performance. Show all posts

Saturday, February 16, 2013

Android Thread Handling in Configuration Change

The Activity Recreating Issue During Configuration Change

When configuration change occurs in an Android device, e.g. rotating the screen from landscape to portrait mode, the Activity will be destroyed and recreated. This could introduce some issues if some tasks inside Activity are not completed during the configuration change. For instance a worker thread may still be running in background and leaking the memory during the configuration change. We assume that the worker thread here in discussion ties to Activity and will communicate back to Activity instance when it completes its task.

Let's take a look at following Activity code:

public class MainActivity extends Activity {
    private static final String TAG = MainActivity.class.getSimpleName();
    private static int instanceCount = 0;
    private Handler handler;
    private Thread thread;
    private TextView textView;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        instanceCount++;
        Log.d(TAG, "onCreate()");

        textView = (TextView)findViewById(R.id.textView1);
        textView.setText("Activity Instance " + String.valueOf(instanceCount));
            
        handler = new Handler() {
            @Override
            public void handleMessage(Message msg) {
                Log.d(TAG, "Handler thread - " + getThreadInfo());
            }
        };

        thread = new Thread(new Runnable() {
            @Override
            public void run() {
                Log.d(TAG, "Worker thread - " + getThreadInfo());
                try {
                    int count = 10;
                    while(count-- > 0) { // pause 10 seconds
                        Thread.sleep(1000); 
                    }
                    Log.d(TAG, "Worker thread sendMmessage to handler");
                    handler.sendEmptyMessage(0);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });
        thread.start();
    }    
    
    @Override
    protected void onDestroy() {
        Log.d(TAG, "onDestroy()");
        super.onDestroy();
    }

    private static String getThreadInfo()
    {
        Thread currentThread = Thread.currentThread();
        String info = String.format("%1$s ID: %2$d Priority: %3$s",  
                currentThread.getName(), currentThread.getId(), currentThread.getPriority());
        return info;
    }
}

A separate thread sleeps for 10 seconds to simulate a long-run task, then updates a UI view by a handler. If the the screen is rotated within 10 seconds, the activity will be recreated, so as a new thread and a new handler. However the old thread is still running in background, consuming resource and leaking the memory. The old Activity object will not be garbage collected at the time of destroy since the handler and thread are referencing it. The view switches from "Activity Instance 1" to "Activity Instance 2", and the LogCat shows:

Disabling Dangling Thread

The easiest method to resolve the issue is set a flag when the activity is destroyed to control the stale thread:

public class MainActivity extends Activity {
    private boolean stopThread = false;
    //...
    
    @Override
    protected void onCreate(Bundle savedInstanceState) {
    //...
        thread = new Thread(new Runnable() {
            @Override
            public void run() {
                Log.d(TAG, "Worker thread - " + getThreadInfo());
                try {
                    int count = 10;
                    while(count-- > 0 && !stopThread) { // pause 10 seconds
                        Thread.sleep(1000); 
                    }
                    if (!stopThread) {
                        Log.d(TAG, "Worker thread sendMmessage to handler");
                        handler.sendEmptyMessage(0);
                    }
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });
        thread.start();
        }
    }
    
    @Override
    protected void onDestroy() {
        Log.d(TAG, "onDestroy()");
        super.onDestroy();
        stopThread = true;
        handler.removeCallbacksAndMessages(null);
    }
    
    //...
 }

The LogCat logs:

Now the first worker thread is cancelled along with its partially completed task. To save the work by first thread, we can use onSaveInstanceState() callback to store the partial result, so later the second worker thread can use it as an initial start point, as described in this post.

Using Static Thread Object

The solution above is not perfect: multiple thread instances created during configuration change which is inefficient and expensive. We can use static thread variable to maintain one thread instance:

public class MainActivity extends Activity {
    private static final String TAG = MainActivity.class.getSimpleName();
    private static int instanceCount = 0;
    private static WorkerThread thread;
    private TextView textView;
    
    private Handler handler = new Handler() {
        @Override
        public void handleMessage(Message msg) {
            Log.d(TAG, "Handler thread - " + getThreadInfo());
        }
    };

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        instanceCount++;
        Log.d(TAG, "onCreate()");

        textView = (TextView)findViewById(R.id.textView1);
        textView.setText("Activity Instance " + String.valueOf(instanceCount));
        
        if (savedInstanceState != null && thread != null && thread.isAlive()) {
            thread.setHandler(handler);
        } else {
            thread = new WorkerThread(handler);
            thread.start();
        }
    }    
    
    @Override
    protected void onDestroy() {
        Log.d(TAG, "onDestroy()");
        super.onDestroy();
        handler.removeCallbacksAndMessages(null);
        if (thread.isAlive()) {
            thread.setHandler(null);
        }
    }

    private static String getThreadInfo()
    {
        Thread currentThread = Thread.currentThread();
        String info = String.format("%1$s ID: %2$d Priority: %3$s",  
                currentThread.getName(), currentThread.getId(), currentThread.getPriority());
        return info;
    }
    
    private static class WorkerThread extends Thread {
        private Handler handler;

        public WorkerThread(Handler handler) {
            super();
            this.handler = handler;
        }

        public void setHandler(Handler handler) {
            this.handler = handler;
        }

        @Override
        public void run() {
            Log.d(TAG, "Worker thread - " + getThreadInfo());
            try {
                int count = 10;
                while (count-- > 0) { // pause 10 seconds
                    Thread.sleep(1000);
                }
                if (handler != null) {
                    Log.d(TAG, "Worker thread sendMmessage to handler");
                    handler.sendEmptyMessage(0);
                }
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

Notice the extended WorkerThread class is also static to avoid memory leak, as in Java non-static inner and anonymous classes will implicitly hold an reference to their outer class. Now the LogCat logs:

As a side note, be cautious to use static variables within Activity to avoid memory leak. If you have to use the static variables, do not forget to cleanup the resources/references in the Activity.onDestroy() callback.

Using Fragment to Retain Thread

Another option, also the recommended way from Android Developer Guild, is to use Fragment with RetainInstance set to true to retain one instance of thread. The worker thread is wrapped into the non-UI Fragment:

public class ThreadFragment extends Fragment {
      private static final String TAG = ThreadFragment.class.getSimpleName();
      private Handler handler;
      private Thread thread;
      private boolean stopThread;

      public ThreadFragment(Handler handler) {
          this.handler = handler;
      }
      
      public void setHandler(Handler handler) {
          this.handler = handler;
      }
      
      @Override
      public void onCreate(Bundle savedInstanceState) { 
        Log.d(TAG, "onCreate()");
        super.onCreate(savedInstanceState);

        setRetainInstance(true); // retain one Fragment instance in configuration change
        
        thread = new Thread(new Runnable() {
            @Override
            public void run() {
                Log.d(TAG, "Worker thread - " + MainActivity.getThreadInfo());
                try {
                    int count = 10;
                    while(count-- > 0 && !stopThread) { // pause 10 seconds
                        Thread.sleep(1000); 
                    }
                    if (handler != null) {
                        Log.d(TAG, "Worker thread sendMmessage to handler");
                        handler.sendEmptyMessage(0);
                    }
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });
        thread.start();
      }

      @Override
      public void onDestroy() {
        Log.d(TAG, "onDestroy()");
        super.onDestroy();
        handler = null;
        stopThread = true;
      }
 }

The Fragment feature was added from Android 3.0 Honeycomb. For older versions you need to include the Android Support package (android-support-v4.jar) to get the Fragment work. With Fragment setup, the main activity will dynamically create or activate existence of ThreadFragment:

public class MainActivity extends Activity {
    private static final String TAG = MainActivity.class.getSimpleName();
    private static int instanceCount = 0;
    private ThreadFragment fragment;
    private TextView textView;
    
    private Handler handler = new Handler() {
        @Override
        public void handleMessage(Message msg) {
            Log.d(TAG, "Handler thread - " + getThreadInfo());
        }
    };

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        instanceCount++;
        Log.d(TAG, "onCreate()");

        textView = (TextView)findViewById(R.id.textView1);
        textView.setText("Activity Instance " + String.valueOf(instanceCount));
        
        FragmentManager fm = getFragmentManager();
        fragment = (ThreadFragment) fm.findFragmentByTag("thread");

        if (fragment == null) {
            fragment = new ThreadFragment(handler);
            fm.beginTransaction().add(fragment, "thread").commit();
        } else { // retained across configuration changes
            fragment.setHandler(handler);
        }
    }    
    
    @Override
    protected void onDestroy() {
        Log.d(TAG, "onDestroy()");
        super.onDestroy();

        fragment.setHandler(handler);
        handler.removeCallbacksAndMessages(null);
    }

    public static String getThreadInfo()
    {
        Thread currentThread = Thread.currentThread();
        String info = String.format("%1$s ID: %2$d Priority: %3$s",  
                currentThread.getName(), currentThread.getId(), currentThread.getPriority());
        return info;
    }
}

The LogCat result:

Thread Safety

The code snippets demoed about are not thread-safe. To make the code thread-safe, we can set the variable as volatile and wrap the setting inside a synchronized method so that only one thread updates the values at any time:

public class ThreadFragment extends Fragment {
    //...
    private volatile Handler handler;
    private volatile boolean stopThread;  
    //...
    
    public void setHandler(Handler handler) {
        synchronized( this.handler ) {
            this.handler = handler;
        }
        if (handler == null){
            requestStop();
        }
    }      

    public synchronized void requestStop() {
        stopThread = true;
    }
    
    thread = new Thread(new Runnable() {
        @Override
        public void run() {
           //...
           synchronized (handler) {
                if (handler != null) {
                    //...
                    handler.sendEmptyMessage(0);
                }
            }
        }
    }
      
    @Override
    public void onDestroy() {
        //...
        requestStop();
        handler = null;
    }
    
    //...
}

Above code avoids the scenarios like the work thread goes into the block after checking handler is not null, but right at that moment the handler is set to null by the main thread. However I am not so sure if such implementation is necessary. Unlike server side services that may be invoked by multiple callers at the same time, Android apps run locally so this kind of race conditions would rarely occur.

Saturday, December 08, 2012

Notes About Android Threading

Main thread, or UI thread, starts Android application, processes UI rendering and handles user interactions. Do NOT execute any long-running or heavy tasks inside main thread otherwise the UI would appear lagging and not responsive.
A Looper holds a message queue and implements infinite loop. It takes tasks one by one from the queue and executes them in sequence. When message queue is empty, the Looper is blocking and waiting for processing next queued task. A thread can associate with a Looper (only one to one relationship), then it becomes a Looper thread being able to process messages and tasks continuously, vs. regular thread that will die when completes its job in the run() method.
Handler is a bridge between Looper message queue and thread(s). Current thread or other threads can push runnable jobs to a Looper message queue by method handler.post() and its scheduled variance (postDelayed, postAtTime, etc.), or send a message to the queue by handler.sendMessage() method and its scheduled variance (sendMessageDelayed, sendMessageAtTime, etc.). Handler can also process the message by the Handler's handleMessage(Message) callback.
You can setup a thread with a Looper manually. Androdi also provides a handy class called HandlerThread for starting a new thread that already has a Looper attached. Not sure why it's not called LooperThread. It's a bit confusing as there's no Handler in a HandlerThread object. You always need to create Handler for a HandlerThread:
```
    HandlerThread  handlerThread = new HandlerThread("Thread name");
    handlerThread.start();
    Handler handler = new Handler(handlerThread.getLooper());
```
Main thread is a Looper thread. All UI interactions are pushed to main thread's Looper message queue and then are processed one by one. Configuration change request, such as orientation rotation, is just a special type of message sent to main thread's message queue.
Only main thread can update UI elements safely. Worker thread or background thread can use following means to work on UI elements:
- Implement main thread Handler, then you can update UI inside handler.handleMessage() callback, and post UI-related task to handler.Post() method or their variance.
- Use View.Post(Runnable). Android View objects have tied to a default Handler inside UI thread so you can post UI-related runnable jobs to it directly.
- Use Activity.runOnUiThread(Runnable) method. Android Activity class has this runOnUiThread helper method to update the UI.
- Use AsyncTask. The AsyncTask implementation takes the advantage of Thread pooling concept and provides a simple, understandable interface. Simply run background task inside the doInbackground() method, and run UI-related work inside the onPreExecute(), onProgressUpdate() and onPostExcute() methods. But be aware of the performance penalty of using it, see next.

AsyncTask thread has low priority of Process.THREAD_PRIORITY_BACKGROUND. With such low priority the total CPU consumption of all AysncTasks together is less than 10 percent of overall CPU power. This may cause some performance issue. You can change this behavior by changing its priority:

    new AsyncTaskClass().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); 
    //...
    class AsyncTaskClass extends AsyncTask<...> {
        //...
        protected Void doInBackground() {
            Thread.currentThread().setPriority(Process.THREAD_PRIORITY_DEFAULT);
            //...
        }
    }

On the other hand the HandlerThread uses Process.THREAD_PRIORITY_DEFAULT priority by default, but you can specify HandlerThread's priority from its constructor, e.g. setting a low priority background:
```
HandlerThread bgThread = new HandlerThread("Background thread", Process.THREAD_PRIORITY_BACKGROUND);
```
The ExecutorService is more powerful and can manage a pool of threads. Go for it if you want to have full control of your threads.
Usually worker thread is created from Activity. But it can also run in Service context (thread object created inside Service) without UI connection. Note that Service still runs in main UI thread by default if it starts from the Activity, but Service is not bound to Activity's life-cycle. IntentService simply extends Service and implements an internal HandlerThread object, so you can guarantee the job passed to IntentService's onHandleIntent callback runs in a separate worker thread.
There're a few ways to deal with recurring or repeating tasks. One common approach is keep posting to Activity's handler by using Handler.postDelayed() inside the runnable task:
```
    int recurInSeconds = 5;
    Handler handler = new Handler();
    handler.postDelayed(runnable, 0); // start the task
    Runnable runnable = new Runnable() {
       @Override
       public void run() {
          doTask(); 
          handler.postDelayed(this, recurInSeconds * 1000); // repeat the task
       }
    };
```
Without Activity context we can use following methods to achieve the goal:
- AlarmManager.setRepeating()/setInexactRepeating() to repeatedly start a Broadcast/Service in which the job is handled. For longer sleep intervals this is the preferred mechanism. It's handled by system and is only triggered at the time arrives thus consuming less power.
- ScheduledThreadPoolExecutor.scheduleWithFixedDelay()/scheduleWithFixedDelay() to repeatedly run a task.
```
ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(1);
executor.scheduleAtFixedRate(runnable, 0, recurInSeconds, TimeUnit.SECONDS);
```
- Use Timer.scheduleAtFixedRate() to do the recurring TimerTask. This is not the recommended way described from Android development guild.
Anders Göransson's Efficient Android Threading slides for DroidCon are very informative. It's the best reference on the topic of Android threading I have found so far.

Friday, February 04, 2011

ASP.NET Cache, MemoryCache and Velocity

Caching is one of the most effective ways to improve performance for high-traffic/high-volume service applications. Caching is considered as a common practice to deal with data access with heavy I/O. ASP.NET Cache from System.Web dll is the only one memory caching implementation you can use in .NET 1.0/2.0/3.0 framework. It's there for so long that we almost forget the question at the very beginning: why put it into the Web context? You have to add System.Web reference even though you are working on class libraries or WinForm applications. Anyway people are just used to this bad design.

Finally Microsoft has a new ObjectCache/MemoryCache caching system from System.Runtime.Caching.dll in the latest version of the .NET 4.0 Framework. The API and usage is similar but it makes more senses now. We can use the MemoryCache in any application (I found it's not applicable to Console apps) without dependency on System.Web. One noticeable change in the new Cache API is that it has only one Add method, and gets rid of the confusion from two Add and Insert methods in ASP.NET Cache. Here's some note from MSDN describing the difference between MemoryCache and traditional ASP.NET Cache:

The MemoryCache class is similar to the ASP.NET Cache class. The MemoryCache class has many properties and methods for accessing the cache that will be familiar to you if you have used the ASP.NET Cache class. The main differences between the Cache and MemoryCache classes are that the MemoryCache class has been changed to make it usable by .NET Framework applications that are not ASP.NET applications. For example, the MemoryCache class has no dependencies on the System.Web assembly. Another difference is that you can create multiple instances of the MemoryCache class for use in the same application and in the same AppDomain instance.

How about that distributed caching system with code name of "Velocity"? It has been re-branded and becomes part of the Windows Server AppFabric - version 1.0 officially released a few months ago. The origin of "Velocity" project was a .NET version of MemCached. I remember our company was considering using "Velocity" three years ago (CTP version at that time). Our lead said let's wait for the mature final version. Now Microsoft eventually makes it the final 1.0 version after four years of work. What a super velocity! It took a new graduate student Brad Fitzpatrick a few months to create the MemCached solution for his LiveJournal site, and now it's pretty much the standard in the OpenSource world. It takes so long for Microsoft to develop a .NET component that's crucial for big systems and enterprise environments. A deep sigh for Microsoft.

Related to .NET caching there's an interesting post talking about loading different cache systems dynamically, including ASP.NET Cache, MemoryCache and AppFabric Cache. The implementation is simple and it could be helpful in some scenario.

Wednesday, January 05, 2011

Good to be Lazy<T>?

Just read Dino Esposito's article Don’t Worry, Be Lazy about the new new Lazy<T> class in .NET Framework 4.0. For short it's a wrapper class to simplify your work for the "lazy loading" approach.

Sometimes we don't want to populate all related data for a complex object at one shot. Ideally, for the performance consideration, we should only retrieve data that are needed. A class is usually abstracted from business model, for instance a Customer object would have properties of ID, Name, Address, Purchased Orders, etc. Populating all these fields could be costly and may require a few expensive calls to backend system. It's possible that only a small portion of an object data is used when requesting a Customer object. Why those purchased orders data are filled when only Customer's name is in interest? Lazy<T> is designed to help for such scenarios: delay loading data until they are actually requested.

Following code example demos such usage. A User class has an Address property, and the Address data are only loaded once when Address property is accessed:


class Program
{
    class Address
    {
        public string Location { get; set; }
        public string GeoCode { get; set; }
        public Address(string loc, string geo)
        {
            this.Location = loc;
            this.GeoCode = geo;
        }
    }

    class User  // User has Address property which is lazily loaded
    {
        public string ID { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }

        public User(string id, string fName, string lName)
        {
            this.ID = id;
            this.FirstName = fName;
            this.LastName = lName;
            _repoAddress = new Lazy<Address>(GetUserAddress);
        }

        private Lazy<Address> _repoAddress = null;  // Lazy container loaded from repository
        private Address _setAddress = null;         // Container to setter
        public Address Address
        {
            get
            {
                if (_setAddress == null)        // Only get repository data if not being set
                    return _repoAddress.Value;
                else
                    return _setAddress;
            }
            set
            {
                _setAddress = value;
            }
        }
        private Address GetUserAddress() // Method to load the address data from repository
        {
            Console.WriteLine("...... Loading address for user {0} ......", this.ID);
            return UserService.GetUserAddress(this.ID);
        }
    }

    static List<Lazy<User>> _repoUserList = new List<Lazy<User>>();             // Fake User repository
    static List<Lazy<Address>> _repoAddressList = new List<Lazy<Address>>();    // Fake Address repository
    static void MockData()  // Mock some sample data
    {
        for (int i = 1; i <= 10; i++)
        {
            string fakeID = i.ToString();
            _repoUserList.Add(new Lazy<User>(() =>
                {
                    Console.WriteLine("... Mocking User {0} ...", fakeID);
                    return new User(fakeID, "FName" + fakeID, "LName" + fakeID);
                }));
            _repoAddressList.Add(new Lazy<Address>(() =>
                {
                    Console.WriteLine("... Mocking Address {0}...", fakeID);
                    return new Address("Location" + fakeID, "GeoCode" + fakeID);
                }));
        }
    }

    class UserService   // Fake Service
    {
        public static Address GetUserAddress(string ID)
        {
            var addr = _repoAddressList.FirstOrDefault(a => a.Value.GeoCode == "GeoCode" + ID);
            return (addr == null) ? null : new Address(addr.Value.Location, addr.Value.GeoCode);
        }
        public static User GetUserByID(string ID)
        {
            var user = _repoUserList.FirstOrDefault(u => u.Value.ID == ID);
            return (user == null) ? null : new User(user.Value.ID, user.Value.FirstName, user.Value.LastName);
        }
    }

    static void Main()
    {
        MockData();
        User user = UserService.GetUserByID("5");
        Console.WriteLine("The user's name: {0} {1}", user.FirstName, user.LastName);
        Console.WriteLine("The user's address: {0} {1}", user.Address.Location, user.Address.GeoCode);
        user.Address = new Address(user.Address.Location + "(new)", user.Address.GeoCode + "(new)");
        Console.WriteLine("The user's updated address: {0} {1}", user.Address.Location, user.Address.GeoCode);
        Console.Read();
    }
}

Result:

... Mocking User 1 ...
... Mocking User 2 ...
... Mocking User 3 ...
... Mocking User 4 ...
... Mocking User 5 ...
The user's name: FName5 LName5
...... Loading address for user 5 ......
... Mocking Address 1 ...
... Mocking Address 2 ...
... Mocking Address 3 ...
... Mocking Address 4 ...
... Mocking Address 5 ...
The user's address: Location5 GeoCode5
The user's updated address: Location5(new) GeoCode5(new)

Above demo code searches a DAO User object from the mock data list, whose Address property is not loaded until its value gets accessed. Notice the private lazy<Address> property is only used in the getter method, and would be overridden by setter value so the custom value can be saved and passed back to service layer for further process.

A LINQ expression can be used to initiate the Lazy<T> as shown in the mock data population. We can see only the first 5 mocked Users were created because the fifth User was returned and the rest 5 mocked Users were not being accessed at all.

The last note is that all Lazy collections in above demo code are not thread-safe. To make Lazy<T> thread-safe, you need to pass a LazyExecutionMode parameter to its constructor, for detail refer to MSDN.

Thursday, April 30, 2009

SQL Server 2008 Spatial Data And Spatial Search

SQL Server 2008 offers a great support for spatial data and spatial search. Spatial data, or geo-spatial data, means locations on the Earth. Microsoft introduces two new data types in SQL Server 2008 to present location data:

Geometry: Geometry data are based on a flat Earth model, and it doesn’t account for the Earth’s curvature.

Geography: Geography data are Points, lines, polygons, or collection of these in latitude and longitude coordinates using a round Earth model.

What’s good about SQL Server 2008 spatial data? The location information can be stored as native Geography data and indexed, so that the location-based search can be efficiently done by SQL Server 2008. I examined such spatial search on a test database with 1-million rows of test data. The result is quite impressive.

Red Gate SQL Data Generator (V1.028) is used to create 1-million test data. Following script is to mock valid latitude and longitude data inside U.S.:


-- UDF Can't use RAND directly. Using View to cheat it
CREATE VIEW vRandNumber AS SELECT RAND() as RandNumber
GO
-- UDF function returns a float number in a given range
CREATE FUNCTION RandFloat(@Min float, @Max float)
RETURNS float AS
 BEGIN
 RETURN @Min + (select RandNumber from vRandNumber) * (@Max-@Min)
 END
GO
-- Update Address with U.S. geo codes
UPDATE Address 
SET Address.Latitude = dbo.RandFloat(32, 48), 
    Address.Longitude = dbo.RandFloat(-124, -72)

With valid Latitude and Longitude data, now we can create a Gepgraphy column in the database:

ALTER TABLE Address ADD GeoPoint Geography NULL
GO
UPDATE Address SET GeoPoint = geography::Point(Latitude, Longitude, 4326)
GO

Querying on this un-indexed Geopgraphy column is very slow. Search locations within a 10-KM radius takes more than 5 minutes:

SET @point = geography::Point(47.32, -122.18, 4326) -—Seattle
SELECT * FROM Address WHERE GeoPoint.STDistance(@point) < 10 * 1000
Result: return 85 rows in 5 minutes 48 seconds

Once the Spatial index is created, the search is much faster than before and the same search only takes 1 second:

CREATE SPATIAL INDEX SIX_Address_GeoGraphy ON Address(GeoPoint);
GO
SELECT * FROM Address WHERE GeoPoint.STDistance(@point) < 10 * 1000
GO
Return 85 rows in 1 second

The test was conducted inside a virtual machine with Intel 2.2G Core2Duo CPU and 2G of RAM.

Wednesday, July 16, 2008

The Cost Of Linq And IQueryable

The Language Integrated Query (Linq) is part of the .NET 3.5. We can easily do Linq queries against IEnumerable objects using very concise syntax by passing a delegate or the new Lambda expression. You can even create your own Linq Provider and implement the IQueryable interface, building something like Linq to Google or Linq to MySql.

It's super great, right? Basically yes, but remember a lot of new features from Microsoft have a cost. In my previous post we discussed the Linq deferred execution issue. Today let's do some tests to see how expensive the Linq queries are. Below is the simple test code. We build a simple key/Value collection (array). Because all collections including array have implemented IEnumerable interface, we can query the objects inside the array with Linq. Furthermore, we can convert the array object to be IQueryable and do query over IQueryable.

using System;
using System.Linq;
using System.Collections.Generic;
using System.Diagnostics;

class Program
{
    // A simple class presenting key/value pair
    class KeyValue
    {
        public KeyValue(string key, int value)
        {
            Key = key;
            Value = value;
        }
        public string Key { get; set; }
        public int Value { get; set; }
    }

    // Simple linear search 
    static KeyValue LinearSearch(KeyValue[] keyValues, string key)
    {
        foreach (KeyValue item in keyValues)
        {
            if (item.Key == key)
                return item;
        }
        return null;
    }

    /// <summary>
    /// Compare search performance with three methods:
    /// 1. Manually go through each object inside collection
    /// 2. Linq search by IEmuerable<T>.Where(lambda_expression);
    /// 3. Convert to IQueryable and then IQueryable<T>.Where(lambda_expression)
    /// </summary>
    /// <param name="keyValues">The array to be search on</param>
    /// <param name="key">Search object by KeyValue.Key</param>
    /// <param name="repeat">Number for repeating the search</param>
    static void SearchTest(KeyValue[] keyValues, string key, int repeat)
    {
        Stopwatch watch = new Stopwatch();

        // Test linear search
        watch.Start();
        for (int i = 0; i < repeat; i++)
            LinearSearch(keyValues, key);
        watch.Stop();
        Console.WriteLine("Linear search over {0} objects (repeating {1} times) takes {2} ms",
            keyValues.Count(), repeat, watch.ElapsedMilliseconds);
        watch.Reset(); watch.Start();

        // Test Linq search
        for (int i = 0; i < repeat; i++)
            keyValues.FirstOrDefault(item => item.Key == key);
        watch.Stop();
        Console.WriteLine("Linq search over {0} objects (repeating {1} times) takes {2} ms",
            keyValues.Count(), repeat, watch.ElapsedMilliseconds);

        // Test IQueryable search
        IQueryable<KeyValue> queryables = keyValues.AsQueryable();
        watch.Reset(); watch.Start();
        for (int i = 0; i < repeat; i++)
            queryables.FirstOrDefault(item => item.Key == key);
        watch.Stop();
        Console.WriteLine("Linq IQueryable search over {0} objects (repeating {1} times) takes {2} ms",
            keyValues.Count(), repeat, watch.ElapsedMilliseconds);
        Console.WriteLine();
    }


    static void Main()
    {
        // Build test data
        KeyValue[] keyValues1 =
            Enumerable.Range(0, 1000).Select(i => new KeyValue(i.ToString(), i)).ToArray();
        KeyValue[] keyValues2 =
            Enumerable.Range(0, 2000).Select(i => new KeyValue(i.ToString(), i)).ToArray();
        KeyValue[] keyValues3 =
            Enumerable.Range(0, 3000).Select(i => new KeyValue(i.ToString(), i)).ToArray();

        // Start Test
        SearchTest(keyValues1, "5000", 1000);
        SearchTest(keyValues2, "10000", 1000);
        SearchTest(keyValues3, "20000", 1000);

        Console.Read();
    }
}

Result:

As expected Linq to objects is slightly slower than the regular linear search, and we can assume Linq uses linear search internally. But IQueryable search is much slower than others. Why that happened? Not like Linq to object where Lambda expression is passing as a function delegate, Lambda expression passed to the IQueryable is converted to Expression tree which is a quite expensive.

The interesting thing is that explicit delegate passed to IQueryable won't be converted to Expression tree. In following code there's no noticeable performance difference between the two queries.

    KeyValue keyValueObj = keyValues.FirstOrDefault(
        delegate(KeyValue obj) { return obj.Key == key; });
    IQueryable<KeyValue> queryables = keyValues.AsQueryable();
    keyValueObj = queryables.FirstOrDefault(
        delegate(KeyValue obj) { return obj.Key == key; });

Conclusion: Linq to objects are working fine for .NET collections, but be cautious to make a collection as IQueryable (collection.AsQueryable() method) if you are using Lambda expression inside the IQueryable queries, because there's performance degradation for it.

Thursday, March 29, 2007

Thread Safety In .NET Collections

Most .NET collections are not thread-safe. Take generic list as an example, the MSDN description is as following:

Thread Safety

Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

A List<T> can support multiple readers concurrently, as long as the collection is not modified. Enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with one or more write accesses, the only way to ensure thread safety is to lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.

So we need to implement read/write locks in multi-threaded environment when the list objects are modified. That's a lot of extra work and is prone to deadlock in a complicate system.

Is there any easy way on this? Can we build a new collection and switch the list reference to the new constructed collection? Following code demos such approach:

using System;
using System.Collections.Generic;
using System.Threading;

class Test
{
    public class Person
    {
        public string Name;
        public int Age;
        public Person(string name, int age)
        {
            Name = name;
            Age = age;
        }
    }

    private static List<Person> _personList = new List<Person>();
    static void Main()
    {
        // Build initial list
        for (int i = 0; i < 10; i++)
        {
            _personList.Add(new Person("Name" + i.ToString(), i));
        }

        new Thread(new ThreadStart(LoopthroughList)).Start();

        //Following thread will throw InvalidOperationException:
        //Exception: Collection was modified; enumeration operation may not execute.
        //new Thread(new ThreadStart(UpdateListUnsafely)).Start();

        // Following thread won't throw exception 
        new Thread(new ThreadStart(UpdateListSafely)).Start();

        Console.Read();
    }

    static void LoopthroughList()
    {
        Console.WriteLine("First loop");
        foreach (Person person in _personList)
        {
            Console.WriteLine("ListHashCode:{0} ListCount:{1} Name:{2} Age:{3}",
                _personList.GetHashCode(), _personList.Count, person.Name, person.Age);
            Thread.Sleep(100);
        }
        Console.WriteLine("Second loop");
        foreach (Person person in _personList)
        {
            Console.WriteLine("ListHashCode:{0} ListCount:{1} Name:{2} Age:{3}",
                _personList.GetHashCode(), _personList.Count, person.Name, person.Age);
        }
    }

    static void UpdateListUnsafely()
    {
        Thread.Sleep(500);
        _personList.RemoveAt(_personList.Count - 1);
        _personList.Add(new Person("A New Person", 20));
    }

    static void UpdateListSafely()
    {
        Thread.Sleep(500);
        List<Person> newList = new List<Person>(_personList);
        newList.RemoveAt(newList.Count - 1);
        newList.RemoveAt(newList.Count - 1);
        newList[newList.Count - 1].Name = "Modified Name";
        newList.Add(new Person("A New Person", 20));
        _personList = newList;
        Console.WriteLine("List updated! ListCount:{0}", _personList.Count);
    }
}

Result:

The result looks promising. We don't see any error when switching the reference (theoretically we should lock this operation). Is such implementation really thread-safe now? Maybe yes maybe no. The problem is that I couldn't find an official answer from Microsoft.