View Single Post
  #1  
Old 07-22-2019, 14:22
chants chants is offline
Family
 
Join Date: Jul 2016
Posts: 456
Rept. Given: 2
Rept. Rcvd 30 Times in 18 Posts
Thanks Given: 375
Thanks Rcvd at 725 Times in 333 Posts
chants Reputation: 30
IDA SDK C++ tricks and tips

IDA offers a good amount of info about functions but its SDK is basically undocumented for accessing some of it. Fortunately ida64.dll can be decompiled and the little info the SDK docs understood.

Functions: func_t offers function details including members:

Code:
      uval_t frame;        ///< netnode id of frame structure - see frame.hpp
...
        // the following fields should not be accessed directly:

      uint32 pntqty;       ///< number of SP change points
      stkpnt_t *points;    ///< array of SP change points.
                           ///< use ...stkpnt...() functions to access this array.

      int regvarqty;       ///< number of register variables (-1-not read in yet)
                           ///< use find_regvar() to read register variables
      regvar_t *regvars;   ///< array of register variables.
                           ///< this array is sorted by: start_ea.
                           ///< use ...regvar...() functions to access this array.

      int llabelqty;       ///< number of local labels
      llabel_t *llabels;   ///< local labels.
                           ///< this array is sorted by ea.
                           ///< use ...llabel...() functions to access this array.

      int regargqty;       ///< number of register arguments
      regarg_t *regargs;   ///< unsorted array of register arguments.
                           ///< use ...regarg...() functions to access this array.

      int tailqty;         ///< number of function tails
      range_t *tails;      ///< array of tails, sorted by ea.
                           ///< use func_tail_iterator_t to access function tails.
For regargs for example there is no way to read except through this do not access structure member. Most of them can only be queried on a per address or per item basis and not obtaining all for the function simultaneously as the structure hints should be easily possible. Further these are stored packed in netnodes and are not populated except on demand. Although the quantity fields will have correct values, the actual data will be null pointers waiting for a load on first use. So how to invoke this first use?

Code:
	size_t num = get_func_qty();
	for (size_t i = 0; i < num; i++) {
		func_t* f = getn_func(i);
		get_frame(f->start_ea); //load frame into internal pointer table
		find_regvar(f, f->start_ea, nullptr); //load regvars
		get_spd(f, f->start_ea); //load points
		read_regargs(f); //load regargs
		get_llabel(f, ea); //load llabels
		func_tail_iterator_t fti(pfn); //load tails
	}
get_frame(f/ea) is the only and necessary way to immediately load the frame information from the database. It is not stored in the function structure but the pointer can be looked up without being loaded/unpacked from db again. If the functions frame member is BADNODE then obviously this will return a null pointer and there is no frame for the function.

find_regvar(f, ea, _) although it is undocumented can accept a nullptr for the third argument and regardless it will always load the regvars member on first use. Presumably one could find the register variables by giving every possible canonical register name via enumeration of ph.reg_names[] array but this would be over doing it as regvars can be populated with a nullptr and accessing the array directly.

get_spd(f, ea), get_effective_sp(f, ea), get_sp_delta(f, ea) will always populate the points array if you pass the function pointer on first use. Although IDA claims a null pointer is possible, this function will always return 0 until it is called at least once on the function in question with that filled in. It does not call get_func(ea) for you like you would think. Totally misdocumented feature. recalc_spd(ea) will not always work either if one of the get_*sp* functions has not been used first - it works in some cases depending on which address in the function is used. If the ea for recalc_spd is part of a function chunk with no sp changes - it will not do anything leaving the points null. It actually walks through instructions in the particular function chunk/tail entry of the ea only so if there is a stack change in that exact one it will populate but its highly unlikely one would know this in advance. With that in mind, the stack points could be enumerated by going through every assembly instruction in the function using the appropriate enumeration functions, and querying get_*sp* functions to gather the info. But since we know the function has the array already such intensive searching is fortunately not necessary.

read_regargs(f) populates the regargs and is the only way to do so. They have to be accessed directly.

Now you have a regarg_t structure so what about this strange type_t *type; member? Mostly we want tinfo_t not type_t raw data.

Well this again is a poorly documented and tricky to access structure. Obviously we do not want to parse through it ourselves as its not SDK version portable, and its packed in a really detailed format. But if you try to load it directly it will cause corruption and a crash or bad data.

Code:
if (f->regargs[i].type != nullptr) {
	tinfo_t ti; //The type information is internally kept as an array of bytes terminated by 0.
	qtype typeForDeser;
	typeForDeser.append(f->regargs[i].type);
	ti.deserialize(get_idati(), &typeForDeser); //will free buffer passed!
}
The underlying deserialize API function looks like and the pointer to a pointer for the type_t member is notable - you must pass an IDA API compatible buffer:
Code:
decl bool ida_export deserialize_tinfo(tinfo_t *tif, const til_t *til, const type_t **ptype, const p_list **pfields, const p_list **pfldcmts);
As for function tails, these are relatively well documented right in funcs.hpp, finally having an iterator and not needing to use the internal access (although you can't index them but must enumerate them, a very small downside, in this case the SDK method is recommended):
Code:
      func_tail_iterator_t fti(pfn);
      for ( bool ok=fti.first(); ok; ok=fti.next() )
        const range_t &a = fti.chunk();
However if you want direct access to the array, and want preloading of it, this will do the trick (or anything that ultimately invokes func_tail_iterator_set(func_tail_iterator_t*, f, ea)):
Code:
      func_tail_iterator_t fti(pfn);
Finally for labels, a different situation occurs. The actual llabel_t structure for a local label and the 3 associated functions set_llabel, get_llabel_ea and get_llabel are all mentioned with caveat: "These are LOW LEVEL FUNCTIONS. When possible, they should not be used. Use high level functions from <name.hpp>". Instead set_name, get_name_ea and get_name respectively should be used. And yes named could be enumerated by doing a walk through each assembly instruction or using various next_* searching functions. But whatever populating for direct access to all function labels quick and easy? This one seems to not be in ida64.dll but is actually provided as a .lib method. But the lib will just forward these to the recommended functions so it does not really matter.

So get_llabel(f, ea) or get_name(f, ea) will reliably do the job and there is no other suitable candidate though get_llabel_ea(f, nullptr) and get_name_ea(f, nullptr) might work if they accept a null pointer for the string.

IDA provides a very practical SDK for a very practical purpose. With a very big support price tag for it, it is no wonder there is poor documentation and lack of consistency. Working on the clock, it would be easier to justify a paper trail of support emails asking about these type of details rather than painstakingly tracking them out through trial and error or reverse engineering SDK lib files or the ida64.dll.

Perhaps these are part of IDA Pro's trade secrets, if you were to think of it as such.

It was quite difficult to get these simple things working correctly as none of them are properly documented. I hope this helps someone on their own IDA SDK development efforts.

In a multithreaded plugin, preloading everything is actually a necessity to minimize thread safety issues around various unsafe database access to race conditions. By preloading all of these structures, even if you want to access them using nicer API methods, it will not cause any database calls. Unfortunately the IDA SDK is designed for single threaded use mostly so doing a dump or lots of UI thread callbacks would be the only other options. But this idea will mitigate all contention except those caused by the user's own interaction changing data structures in use.

Last edited by chants; 07-22-2019 at 14:42.
Reply With Quote
The Following 3 Users Gave Reputation+1 to chants For This Useful Post:
ahmadmansoor (07-22-2019), mr.exodia (07-29-2019), yoza (07-22-2019)
The Following 6 Users Say Thank You to chants For This Useful Post:
ahmadmansoor (07-22-2019), niculaita (08-16-2019), nimaarek (08-11-2019), user1 (08-30-2019), WRP (07-29-2019), yoza (07-22-2019)