Generating a Cross-Platform Unique Machine Fingerprint
Sometimes you need a program to know if it is running on the same machine. There are a lot of purposes for this, but a common one is if you are writing a licensing module. You want to generate a machine id, get a cryptographic license key from a server, and then each time the program runs you check the machine’s fingerprint against the one stored in the license file.
But how to identify the machine?
Hardware IDs: Many pieces of hardware have a hardware serial number. This includes the CPU, each of the hard disks, and all the network interfaces, which each have a globally unique MAC address.
Software IDs: Each disk volume generally will also have a software serial number. You can also use the machine’s name as part of the identifier. If the user has changed the machine’s name then this is an indication that the user considers this a different system.
The software id’s are in the user’s control and can be duplicated. In other words the user could clone the machine and have it have the same name and same volume ids as the cloned system. But the hardware ids are almost impossible to duplicate.
So the strategy is, get all these numbers, hash them together to create a fingerprint, and then compare the fingerprint against the license file.
We will want to give the system a little flexibility such that if a few components are upgraded the fingerprint may be different – but might be similar enough for you to consider a machine that has say just a change in a network card, or hard disk upgrade to be the same machine for your purposes.
Getting the IDs
Lets get to work. First we create a cross platform set of functions to return these ids. Here is the header:
void getMacHash( u16& mac1, u16& mac2 ); u16 getVolumeHash(); u16 getCpuHash(); const char* getMachineName();
Now on Windows you get these ids like this:
#include <windows.h> #include <intrin.h> #include <iphlpapi.h> // we just need this for purposes of unique machine id. // So any one or two mac's is fine. u16 hashMacAddress( PIP_ADAPTER_INFO info ) { u16 hash = 0; for ( u32 i = 0; i < info->AddressLength; i++ ) { hash += ( info->Address[i] << (( i & 1 ) * 8 )); } return hash; } void getMacHash( u16& mac1, u16& mac2 ) { IP_ADAPTER_INFO AdapterInfo[32]; DWORD dwBufLen = sizeof( AdapterInfo ); DWORD dwStatus = GetAdaptersInfo( AdapterInfo, &dwBufLen ); if ( dwStatus != ERROR_SUCCESS ) return; // no adapters. PIP_ADAPTER_INFO pAdapterInfo = AdapterInfo; mac1 = hashMacAddress( pAdapterInfo ); if ( pAdapterInfo->Next ) mac2 = hashMacAddress( pAdapterInfo->Next ); // sort the mac addresses. We don't want to invalidate // both macs if they just change order. if ( mac1 > mac2 ) { u16 tmp = mac2; mac2 = mac1; mac1 = tmp; } } u16 getVolumeHash() { DWORD serialNum = 0; // Determine if this volume uses an NTFS file system. GetVolumeInformation( "c:\\", NULL, 0, &serialNum, NULL, NULL, NULL, 0 ); u16 hash = (u16)(( serialNum + ( serialNum >> 16 )) & 0xFFFF ); return hash; } u16 getCpuHash() { int cpuinfo[4] = { 0, 0, 0, 0 }; __cpuid( cpuinfo, 0 ); u16 hash = 0; u16* ptr = (u16*)(&cpuinfo[0]); for ( u32 i = 0; i < 8; i++ ) hash += ptr[i]; return hash; } const char* getMachineName() { static char computerName[1024]; DWORD size = 1024; GetComputerName( computerName, &size ); return &(computerName[0]); }
On Linux and OsX you get them like this:
#include "machine_id.h" #include <unistd.h> #include <errno.h> #include <sys/types.h> #include <sys/socket.h> #include <netdb.h> #include <netinet/in.h> #include <netinet/in_systm.h> #include <netinet/ip.h> #include <netinet/ip_icmp.h> #include <sys/types.h> #include <sys/ioctl.h> #ifdef DARWIN #include <sys/types.h> #include <sys/socket.h> #include <net/if_dl.h> #include <ifaddrs.h> #include <net/if_types.h> #else //!DARWIN #include <linux/if.h> #include <linux/sockios.h> #endif //!DARWIN #include <sys/resource.h> #include <sys/utsname.h> //---------------------------------get MAC addresses --------------------------------- // we just need this for purposes of unique machine id. So any one or two // mac's is fine. u16 hashMacAddress( u8* mac ) { u16 hash = 0; for ( u32 i = 0; i < 6; i++ ) { hash += ( mac[i] << (( i & 1 ) * 8 )); } return hash; } void getMacHash( u16& mac1, u16& mac2 ) { mac1 = 0; mac2 = 0; #ifdef DARWIN struct ifaddrs* ifaphead; if ( getifaddrs( &ifaphead ) != 0 ) return; // iterate over the net interfaces bool foundMac1 = false; struct ifaddrs* ifap; for ( ifap = ifaphead; ifap; ifap = ifap->ifa_next ) { struct sockaddr_dl* sdl = (struct sockaddr_dl*)ifap->ifa_addr; if ( sdl && ( sdl->sdl_family == AF_LINK ) && ( sdl->sdl_type == IFT_ETHER )) { if ( !foundMac1 ) { foundMac1 = true; mac1 = hashMacAddress( (u8*)(LLADDR(sdl))); //sdl->sdl_data) + sdl->sdl_nlen) ); } else { mac2 = hashMacAddress( (u8*)(LLADDR(sdl))); //sdl->sdl_data) + sdl->sdl_nlen) ); break; } } } freeifaddrs( ifaphead ); #else // !DARWIN int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP ); if ( sock < 0 ) return; // enumerate all IP addresses of the system struct ifconf conf; char ifconfbuf[ 128 * sizeof(struct ifreq) ]; memset( ifconfbuf, 0, sizeof( ifconfbuf )); conf.ifc_buf = ifconfbuf; conf.ifc_len = sizeof( ifconfbuf ); if ( ioctl( sock, SIOCGIFCONF, &conf )) { assert(0); return; } // get MAC address bool foundMac1 = false; struct ifreq* ifr; for ( ifr = conf.ifc_req; (s8*)ifr < (s8*)conf.ifc_req + conf.ifc_len; ifr++ ) { if ( ifr->ifr_addr.sa_data == (ifr+1)->ifr_addr.sa_data ) continue; // duplicate, skip it if ( ioctl( sock, SIOCGIFFLAGS, ifr )) continue; // failed to get flags, skip it if ( ioctl( sock, SIOCGIFHWADDR, ifr ) == 0 ) { if ( !foundMac1 ) { foundMac1 = true; mac1 = hashMacAddress( (u8*)&(ifr->ifr_addr.sa_data)); } else { mac2 = hashMacAddress( (u8*)&(ifr->ifr_addr.sa_data)); break; } } } close( sock ); #endif // !DARWIN // sort the mac addresses. We don't want to invalidate // both macs if they just change order. if ( mac1 > mac2 ) { u16 tmp = mac2; mac2 = mac1; mac1 = tmp; } } u16 getVolumeHash() { // we don't have a 'volume serial number' like on windows. // Lets hash the system name instead. u8* sysname = (u8*)getMachineName(); u16 hash = 0; for ( u32 i = 0; sysname[i]; i++ ) hash += ( sysname[i] << (( i & 1 ) * 8 )); return hash; } #ifdef DARWIN #include <mach-o/arch.h> u16 getCpuHash() { const NXArchInfo* info = NXGetLocalArchInfo(); u16 val = 0; val += (u16)info->cputype; val += (u16)info->cpusubtype; return val; } #else // !DARWIN static void getCpuid( u32* p, u32 ax ) { __asm __volatile ( "movl %%ebx, %%esi\n\t" "cpuid\n\t" "xchgl %%ebx, %%esi" : "=a" (p[0]), "=S" (p[1]), "=c" (p[2]), "=d" (p[3]) : "0" (ax) ); } u16 getCpuHash() { u32 cpuinfo[4] = { 0, 0, 0, 0 }; getCpuid( cpuinfo, 0 ); u16 hash = 0; u32* ptr = (&cpuinfo[0]); for ( u32 i = 0; i < 4; i++ ) hash += (ptr[i] & 0xFFFF) + ( ptr[i] >> 16 ); return hash; } #endif // !DARWIN const char* getMachineName() { static struct utsname u; if ( uname( &u ) < 0 ) { assert(0); return "unknown"; } return u.nodename; }
Next we combine them to create a fingerprint.
First we create some information hiding functions. These are just to make the hashes a little less obvious where they came from. This is not true cryptographic security. Just a little something to make the hashes a little less obvious to reverse engineer.
u16 mask[5] = { 0x4e25, 0xf4a1, 0x5437, 0xab41, 0x0000 }; static void smear( u16* id ) { for ( u32 i = 0; i < 5; i++ ) for ( u32 j = i; j < 5; j++ ) if ( i != j ) id[i] ^= id[j]; for ( u32 i = 0; i < 5; i++ ) id[i] ^= mask[i]; } static void unsmear( u16* id ) { for ( u32 i = 0; i < 5; i++ ) id[i] ^= mask[i]; for ( u32 i = 0; i < 5; i++ ) for ( u32 j = 0; j < i; j++ ) if ( i != j ) id[4-i] ^= id[4-j]; }
Now we use the 16-bit hashes to create a 72 bit machine fingerprint. We will use the cpu id, the volume id, and the first two MAC addresses for the first 64 bits, then add 16 bits of check digits:
static u16* computeSystemUniqueId() { static u16 id[5]; static bool computed = false; if ( computed ) return id; // produce a number that uniquely identifies this system. id[0] = getCpuHash(); id[1] = getVolumeHash(); getMacHash( id[2], id[3] ); // fifth block is some checkdigits id[4] = 0; for ( u32 i = 0; i < 4; i++ ) id[4] += id[i]; smear( id ); computed = true; return id; }
This is a human readable version of the id that includes the system name. So users can match fingerprints to systems when managing license files:
const char* getSystemUniqueId() { // get the name of the computer KxCbuf buf; buf << getMachineName(); u16* id = computeSystemUniqueId(); for ( u32 i = 0; i < 5; i++ ) { char num[16]; snprintf( num, 16, "%x", id[i] ); buf << "-"; switch( strlen( num )) { case 1: buf << "000"; break; case 2: buf << "00"; break; case 3: buf << "0"; break; } buf << num; } char* p = buf.getBuffer(); while ( *p ) { *p = toupper( *p ); p++; } return KxSymbol( buf.getBuffer()).string(); }
KxCBuf is a convenient stream object that holds a text buffer. You can substitute this with a Std:String
, if you don’t have something like this of your own already.
This is how we validate the fingerprint.
static bool validate( KxSymbol testIdString ) { // unpack the given string. parse failures return false. KxCbuf testString; testString << testIdString; char* testName = strtok( testString.getBuffer(), "-" ); if ( !testName ) return false; u16 testId[5]; for ( u32 i = 0; i < 5; i++ ) { char* testNum = strtok( NULL, "-" ); if ( !testNum ) return false; testId[i] = (u16)(strtol( testNum, NULL, 16 )); } unsmear( testId ); // make sure this id is valid - by looking at the checkdigits u16 check = 0; for ( u32 i = 0; i < 4; i++ ) check += testId[i]; if ( check != testId[4] ) return false; // get the current system information u16 systemId[5]; memcpy( systemId, computeSystemUniqueId(), sizeof( systemId )); unsmear( systemId ); // now start scoring the match u32 score = 0; for ( u32 i = 0; i < 4; i++ ) if ( testId[i] == systemId[i] ) score++; if ( !strcmp( getMachineName(), testName )) score++; // if we score 3 points or more then the id matches. return ( score >= 3 ) ? true : false; }
Hi Rafael, I realise this is a few years ago now, but I found this article really helpful, so many thanks for that.
I just wanted to check a detail. I am looking to implement something similar which is Mac specific and does not need to be cross platform, I have been looking carefully through the above code to understand the implications in terms of uniqueness and flexibility for users changing hardware. I wanted to check my understanding with you if you could spare a moment as I wondered if there was an issue relating to the potential uniqueness of the fingerprint for license validation.
My understanding is the algorithm is creating 6 components and that on the Mac platform these components are:
– Computer name
– CPU hash which would be unique only for machines with different processor models
– Volume hash which is based on the computer name and would match if the computer name also matched
– Hash of the Mac addresses of the first two found net interfaces
– Check digits which just verify that the CPU hash, Volume Hash and Mac Hashes are valid and haven’t been corrupted in the fingerprint
The fingerprint is considered a match if 3 or more of the components match (excluding the check digit).
My question is, the algorithm would presumably generate a match if the following 3 bits of information were the same:
– Computer name
– CPU Hash
– Volume Hash
And given that the volume hash is generated from the computer name, does this not mean that two users running identical MacBook Pro (good example because the same model will have the same hardware) and with the same computer name would generate a matching fingerprint?
Would a sensible fix to this issue to be to reduce the tolerance for hardware change and require 4 elements to match, so at least one Mac address must match? I presume this issue arises because the windows algorithm uses a volume address rather than the volume name and so is likely more unique than on a Mac where the volume has is generated from the machine name.
A second thing I spotted, which I am not sure if its a problem is that when I implemented a similar algorithm it was not able to handle volume names with a “-” in it, because this was detected by the tokenising and all other elements of the fingerprint were then incorrect. I had to modify my algorithm to use the last 5 elements rather than starting at the beginning.
Thanks for your work putting this page together, I hope the above is of interest.
Yeah. In a later version of this code, for OsX I abandoned the idea of using a disk volume hash, since there is no volume serial number, and added a “system serial number” as a component – since Os X machines each have a unique serial number on the motherboard that you can read. Code looks like this:
Hi rafael.Thanks for all.
Can you send the smear, unsmear, computeSystemUniqueId, getSystemUniqueId and validate with the modify getSystemSerialNumberHash for OSX?
Hi, Rafael. Have you confirmed if __cpuid( cpuinfo, 0 ); returns a unique cpu serial number? I’ve learned that it simply returns “GenuineIntel” and “AuthenticAMD” strings. cpuid( cpuinfo, 1) seems to return other info like stepping id etc. But I don’t know if it’s unique.
I tried to use only the cpu id in one of my cpp applications but the ids are same for all computers in my office!
__cpuid( cpuinfo, 1) is the signature of a CPU!
ref: https://en.wikipedia.org/wiki/CPUID
I know very little of C, and i am facing a hard time compiling this program. Could you please help me compile this?