Wednesday, January 20, 2010

windbg sos.dll version issue

I debugged a .net 1.1 based windows application which exits silently upon start up. The problem itself is trivial and not worth mentioning. What I want to say is there is a subtle point about sos.dll version.

When I was debugging, I started the application under windbg. Then issue ".loadby sos mscorwks" command to load sos.dll extension corresponds to the running .net framework. And I entered !DumpAllExceptions command which should exist in sos.dll for .net framework 1.1, but ended in not finding this command:
   No export DumpAllExceptions found
Finally, I had to use "!DumpHeap -type Exception" to find out all exceptions.

Having done some investigation, I found there are two sos.dll files for .net 1.1. One in .net framework installation folder, and one in windbg installation folder. The latter one is a full featured extension and support DumpAllExceptions command.
I tried debugging the application again with sos.dll comes with windbg by issusing: ".load windbg_installation_folder/clr10/sos.dll". This time, DumpAllExceptions was back to life and worked like a charm.
BTW, an alternative way to do !DumpAllExceptions is to take advantage of .foreach command.
   .foreach(exception {!DumpHeap -type Exception -short}) {!do exception; .echo print exception done !!! *****************}

For convenience, below are commands supported by different version sos.dll.

0:000> !help
SOS : Help
COMState             | List COM state for each thread
ClrStack             | Provides true managed stack trace, source and line numbers.
                       Additional parameters: -p[arams] -l[ocals] -r[egs] -a[ll].
DumpClass      | Dump EEClass info
DumpDomain []  | List assemblies and modules in a domain
DumpHeap [-stat] [-min 100] [-max 2000] [-mt 0x3000000] [-type ] [-fix] [start [end]] | Dump GC heap contents
DumpMD         | Dump MethodDesc info
DumpMT [-MD]   | Dump MethodTable info
DumpModule     | Dump EE Module info
DumpObj        | Dump an object on GC heap
DumpStack [-EE] [-smart] [top stack [bottom stack] | -EE only shows managed stack items.
DumpStackObjects [top stack [bottom stack]
DumpVC    | Dump a value class object
EEHeap [-gc] [-win32] [-loader] | List GC/Loader heap info
EEStack [-short] [-EE] | List all stacks EE knows
EEVersion            | List mscoree.dll version
FinalizeQueue [-detail]     | Work queue for finalize thread
GCInfo [] [IP]   | Dump GC encoding info for a managed method
GCRoot         | Find roots on stack/handle for object
IP2MD          | Find MethodDesc from IP
Name2EE | Find memory address of EE data given a class/method name
ObjSize []     | Find number of bytes that a root or all roots keep alive on GC heap.
ProcInfo [-env] [-time] [-mem] | Display the process info
RWLock [-all] | List info for a Read/Write lock
SyncBlk [-all|#]     | List syncblock
ThreadPool           | Display CLR threadpool state
Threads              | List managed threads
Token2EE  | Find memory address of EE data for metadata token
u [] [IP]        | Unassembly a managed code

{windbg installation folder}\clr10\sos.dll

0:000> !help
Did you know that a lot of exceptions (!dumpallexceptions) can cause memory problems. To see more tips, run !tip.
SOS is a debugger extension DLL designed to aid in the debugging of managed
programs. Functions are listed by category, then roughly in order of
importance. Shortcut names for popular functions are listed in parenthesis.
Type "!help " for detailed info on that function.

Object Inspection                  Examining code and stacks
-----------------------------      -----------------------------
DumpObj (do)                       Threads (t)
DumpAllExceptions (dae)            CLRStack
DumpStackObjects (dso)             IP2MD
DumpHeap (dh)                      U
DumpVC                             DumpStack
GCRoot                             EEStack
ObjSize                            GCInfo
FinalizeQueue                      COMState
DumpDynamicAssemblies (dda)        X
DumpField (df)                     SearchStack
TraverseHeap (th)

Examining CLR data structures      Diagnostic Utilities
-----------------------------      -----------------------------
DumpDomain                         VerifyHeap (vh)
EEHeap                             DumpLog
Name2EE                            FindAppDomain
SyncBlk                            SaveModule
DumpASPNETCache (dac)              SaveAllModules (sam)
DumpMT                             GCHandles
DumpClass                          GCHandleLeaks
DumpMD                             FindDebugTrue
Token2EE                           FindDebugModules
EEVersion                          Bp
DumpSig                            ProcInfo
DumpModule                         StopOnException (soe)
ThreadPool (tp)                    TD
ConvertTicksToDate (ctd)           Analysis
ConvertVTDateToDate (cvtdd)        Bl
RWLock                             CheckCurrentException (cce)
DumpConfig                         CurrentExceptionName (cen)
DumpHttpRuntime                    ExceptionBp
DumpSessionStateConfig             FindTable
DumpBuckets                        LoadCache
DumpHistoryTable                   SaveCache
DumpRequestTable                   ASPXPages
DumpCollection (dc)                DumpGCNotInProgress
DumpDataTables                     CLRUsage
DumpLargeObjectSegments (dl)    
DumpAssembly                       Other
DumpMethodSig                      -----------------------------
DumpRuntimeTypes                   FAQ
DumpXmlDocument (dxd)

0:000> !help
SOS is a debugger extension DLL designed to aid in the debugging of managed
programs. Functions are listed by category, then roughly in order of
importance. Shortcut names for popular functions are listed in parenthesis.
Type "!help " for detailed info on that function.

Object Inspection                  Examining code and stacks
-----------------------------      -----------------------------
DumpObj (do)                       Threads
DumpArray (da)                     CLRStack
DumpStackObjects (dso)             IP2MD
DumpHeap                           U
DumpVC                             DumpStack
GCRoot                             EEStack
ObjSize                            GCInfo
FinalizeQueue                      EHInfo
PrintException (pe)                COMState
TraverseHeap                       BPMD

Examining CLR data structures      Diagnostic Utilities
-----------------------------      -----------------------------
DumpDomain                         VerifyHeap
EEHeap                             DumpLog
Name2EE                            FindAppDomain
SyncBlk                            SaveModule
DumpMT                             GCHandles
DumpClass                          GCHandleLeaks
DumpMD                             VMMap
Token2EE                           VMStat
EEVersion                          ProcInfo
DumpModule                         StopOnException (soe)
ThreadPool                         MinidumpMode
DumpMethodSig                      Other
DumpRuntimeTypes                   -----------------------------
DumpSig                            FAQ

Ex 15.4-5 of introduction to algorithms

Give an O(n squared)-time algorithm to find the longest monotonically increasing subsequence of a sequence of n numbers.

A brute-force approach is enumerate all subsequences of the n numbers and find out a monotonically increasing one with the longest length. This algorithm has a poor exponential running time.
This problem exhibit optimal substructure and overlapping subproblems properties, and is suited for dynamic programming.

Let A denotes the array of n numbers, A[i] is the ith number of the array.
Let P{i,j} denotes problem of finding the longest monotonically increasing subsequence. So the original problem is P{1,n}.
Let S(i,j) denotes the length of longest subsequence of P{i,j}.
Let M(i,j) denotes the largest number in the longest subsequence of P{i,j}. Note for P{i,j}, there may exist several subsequence has the same longest length, M(i,j) should be the smallest one of them.

For P{1,j}, if A[j] is larger than M(1,j-1), then S(1,j) equals S(1,j-1) plus 1. Otherwise, it equals S(1,j-1).
So, we have:
  S(1,j) = S(1,j-1) if A[j] <> M[1,j-1]

The complicated thing in this procedure is how to maintain M's value correctly. The idea is if A[j] is larger than M(1,j-1), M(1,j) should be A[j]. If A[j] is smaller than M(1,j-1), M(1,j) should either be M(1,j-1) or A[j] if A[j] is larger than M(1,x) where S(1,x) is less than S(1,j-1).

This equation yields a n squared running time algorithm.


Having done some tests, the preceding algorithm failed for this case: "8 9 1 2 3 4".  

The recursion can be performed another way. Let S(i) denotes the length of the longest subsequence that ended with item A(i). So, the relationship between a problem and its subproblem can be expressed as:

S(i) = max{ S(k)+1 } (k is between 0 and i -1) that all k satisfies A[k] is less than A[i]

And the final result is the largest one of array S.

Source code for this solution is here:

Wednesday, January 13, 2010

google against GFW

I keep using because I was hoping it will not be banned by the GFW some day.
But, according to this, a new approach to china, it seems it's becoming less likely will be unbanned. And things may become worse in china, more and more google services may be affected.

To some extent, I'm glad to see google can hold its motto "Don't be evil" firmly and stand out to fight against such extreme information blocking. But, it's hard to show full support to this action considering its consequence for google & chinese people. Just wait and see how chinese gov will response.

Update on March 23, 2010:
Finally, it happens: A new approach to China: an update. Applaud for you, google!

Sunday, January 10, 2010

fix "no module named readline" on windows

I tried to download android source code by following instructions in this article: Using Repo and Git. Because I'm working with cygwin on windows, the readline module isn't available. The repo script failed to run and compained "no module named readline".
Luckily, the Ipython project provides an alternative readline module named pyreadline that can be used on windows and mac osx.
To install it:
1. Download pyreadline for windows.
2. unzip the setup file.
3. Copy PURELIB/ and PURELIB/pyreadline to $(python_installation_folder)/Lib

After it's done, the repo script can run successfully.

Update: In order not to get the python installation folder polluted, we can create a directory for all manually installed python modules. And set PYTHONPATH environment variable to this directory so that modules installed here can be loaded.

Saturday, January 9, 2010

remove visual sourcesafe binding

I need to change a visual stuio 2003 project's source control system from visual sourcesafe to subversion. In order to commit clean source files to svn, I first need to completely remove vss binding from the project.
There are a lot of articles about how to do this, I was following this one: Removing a Solution from Sourcesafe. But the problem is the project is comprised of two solutions, each with a bunch of projects. It's tedious to manually edit them manually. So, I managed to do this with following commands. These commands are available in cygwin on windows.

1. remove vss files
find ./ -name "*scc" | xargs rm
Explaination: this command uses find to get all files whose name end in "scc", then pipes the list to xargs. xargs will format the list in acceptable format and invoke rm to delete them.
There is a utility also named find on windows. In order to make sure the gnu find is invoked, I arranges the PATH environment variable so that the cygwin/bin folder precedes C:\windows\system32.

2. remove vss information in project files
find ./ -name "*.csproj" -exec sed -i '/Scc/ d'
Explaination: this command finds all files whose extension are ".csproj", and execute " sed -i '/Scc/ d' " on each file found. sed is a stream editor. This sed command searches for lines that have Scc and delete them. And the -i argument tells sed to edit file in place, so that the project gets updated.

3. restore file permission
find ./ -name "*.csproj" -exec chmod +rw {} ;
Explanation: after sed modifies a file, I lose all permission on that file. So I need to restore file permission with chmod. There is also a windows command line utility cacls can do this. And cacls can be a better choice here since microsoft internal tool may set file permission more properly than chmod.

Thursday, January 7, 2010

android property system

Property system is an important feature on android. It runs as a service and manages system configurations and status. All these configurations and status are properties. A property is a key/value pair, both of which are of string type.
From the sense of function, it's very similar to windows registry. Many android applications and libraries directly or indirectly relies on this feature to determine their runtime behavior. For example, adbd process queries property service to check if it's running in emulator. Another example is the returns the value stored in property service.

How property system works
The high level architecture of property system is shown as following.
android prop sys arch
In the figure, there are three processes, a group of persistent property files and a shared memory block. The shared memory block is the container of all property records. Only the property service process can write to the shared memory block. It'll load property records from persistent the save them in the shared memory.
The consumer process loads the shared memory in its own virtual space and access properties directly. The setter process also loads the shared memory in its virtual space, but it can't write to the memory directly. When the setter tries to add or update a property, it sends the property to property service via unix domain socket. The property service will write the property to shared memory on behalf of the setter process, as well as to the persistent file.

Property service runs inside init process. The init process first creates a shared memory region and stores a fd to the region. Then init process maps the region into its virtual space with mmap with MAP_SHARED flag, as a result, any updates to this area can be seen by all processes. This fd and region size are saved in a environment variable named "ANDROID_PROPERTY_WORKSPACE". Any other processes like consumer and setter will use this environment variable to get the fd and size, so that they can mmap this region into its own virtual space. The layout of the shared memory is shown below.
androi prop mem layout
After that, init process will load properties from following files:

The next step is start property service. In this step, a unix domain socket server is created. This socket's pathname is "/dev/socket/property_service" which is well known to other client processes.
Finally, init process calls poll to wait for connect event on the socket.

On the consumer side, when it initializes libc(bionic/libc/bionic/libc_common.c __libc_init_common function). It will retrieve the fd and size from environment variable, and map the shared memory into its own space(bionic/libc/bionic/system_properties.c __system_properties_init function). After that, libcutils can read property just as normal memory for the consumer.

Currently, properties can't be removed. That's to say, once a property has been added, it can't be removed, neither can its key be changed.

How to get/set properties

There are three main means to get/set properies on android.

1. native code
When writing native applications, property_get and property_set APIs can be used to get/set properties. To use them, we need to include cutils/properties.h and link against libcutils.

2. java code
Android also provides System.getProperty and System.setProperty functions in java library, our java application can use them to get/set properties.
But it's important to note that although these java APIs are semantically equal to native version, java version store data in a totally different place. Actually, a hashtable is employed by dalvik VM to store properties. So, java properties are separated, it can't get or set native properties, and neither vice versa.

Update: Andrew mentioned that android.os.SystemProperties class can manipulate native properties, though it's intended for internal usage only. It calls through jni into native property library to get/set properties.

3. shell script
Android provides getprop and setprop command line tool to retrieve and update properties. They can be used in shell script. They are implemented on top of libcutils.

Monday, January 4, 2010

understanding the android media framework

android media framework is built on top of a set of media libraries, including OpenCORE, vorbis and sonivox. So one of goal of android media framework is to provide a consistent interface for all services provided by underlying libraries and make them transparent to users.

The figure below shows the dependency relationships between libraries of the media framework.


In this figure, green components are media libraries, yellow components are android internal libraries, grey components are external libraries, and the light blue class is the java consumer of the media framework. Except for class, all components are implemented in c or c++.

The core of the media framework is composed of libmedia, libmediaplayerservice and libmedia_jni. Their codes reside in frameworks/base/media folder.

libmedia defines the inheritance hierarchy and base interfaces. It’s the base library.

libmedia_jni is the shim between java application and native library. First, it implements the JNI specification so that it can be used by java application. Second, it implements the facade pattern for the convenience of caller.

libmediaplayerservice implements some of concrete players and the media service which will manage player instances.

The figure below shows the class hierarchy.android_media_classes

This is a simplified version of the class hierarchy. Only some core classes are included. Classes in blue are defined in libmedia, classes in green are defined in libmediaplayerservice, classes in light yellow are defined in binder, which implements the IPC on android. And classes in grey are defined in other libs.

Note the BpInterface and BnInterface are template classes. Any instantiation of them also inherit the template argument INTERFACE as well.

In the class hierarchy diagram, though listed as a separate module, binder is actually implemented inside libutils component whose source code locate at /frameworks/base/libs/utils folder.

An interesting thing to note is in android, the application that intends to show the media content and the player that actually renders the media content run in different process. The red line in the sequence diagram below shows the boundary of two processes.


The figure shows three most common operations, creating a new player, setting datasource and playing. The last MediaPlayerBase object is the interface that MediaPlayerService::Client object uses to refer to the concrete player instance. The concrete player can be VorbisPlayer, PVPlayer, or any other player, depending on the type of the media to be played.

When an application creates a object, it’s actually holding a proxy which can be used to manipulate the concrete player resides in the mediaserver process. During the whole procedure, two process communicates with Binder IPC mechanism.

Having knowledge above, it’s not difficult to understand why MediaPlayer doesn’t provide an API to use memory stream as source. Because the memory manipulated by the stream is in the address space of the application, and it’s not directly accessible by the mediaserver process.


Google I/O, Mastering the Android Media Framework

android media framework uml diagram