blob uuid

Discussion in 'Ask the Development Team' started by moosemilker, Apr 24, 2018.

Tags:
  1. moosemilker

    moosemilker Member

    Joined:
    Mar 1, 2018
    Messages:
    38
    Likes Received:
    9
    Curious... I've been wracking my brain to learn the reason for this. Why use a blob as PK and FK to other tables in the database, especially a simple relational database like 'desolationredux'? Blobs are not great for building indices, nor are they great for performance in general. They also complicate joins and other database management activities. Is this a security feature, or perhaps there is some other reason? If so, can someone send a link? As stated at the beginning... Curious. :)

    Food for thought and one of the many references that advise against using blobs as primary keys:

    https://stackoverflow.com/questions...if-i-use-a-blob-field-as-primary-key-in-sqlit

    Why you shouldn't use it
    A primary key is often indexed and used for sorting. A BLOB cannot be indexed which makes it the slowest of all datatypes. In fact, it is the worst choice as primary key and most databases, including the SQL99 standard, disallow it.

    The problem with a BLOB is that its datatype is not known by the database (a BLOB should only be used for anything undefined, like an logo, an image, a word document, that can only be stored as binary data). Hence it cannot optimize it. Another problem is display. A blob cannot simply be displayed as text.

    Most SQL implementations do not allow BLOB fields compared, but SQLite allows it. However, it converts anything your compare it to into a blob and then compares it bit by bit.

    Best alternative
    The best option for a primary key column in SQLite is to use the INTEGER PRIMARY KEY as described here;: http://www.sqlite.org/lang_createtable.html#rowid it gives the best performance (it is already there as the rowid column, it is just aliased).

    Conclusion
    To answer your question: yes, it influences the performance badly. But more importantly, it will make it very hard to manage your tables well. Use INTEGER PRIMARY KEY, it is really the best, guaranteed unique and is blazingly fast.​

    More PK references / reading material:
    UPDATE:
    Digging deeper, I see these primary keys are all 16 character HEX values, but they translate to blobs since they are a blend of characters and numbers. Per all the database research I have done recently on this, they still are inefficient primary keys.

    Another reference, specifically on HEX primary keys:

    Thanks,
    Aaron

    PS - Off topic, but thanks to the database developer for placing cascade deletes on all Foreign Key Constraints! So easy to delete 1 target player's full slice of data that way... MUCH appreciated! :)
     
    Last edited: Apr 25, 2018
    jus61 likes this.
  2. Legodev

    Legodev Developer Staff Member

    Joined:
    Jul 18, 2016
    Messages:
    394
    Likes Received:
    137
    First thank you for your critic, i appreciate it! :)
    -> Before i start to dismantle what you say here the sumup: You are right with everything you say, but the problem is you orientate on SQLite (i guess Lite explains everything :D) not on MySQL/MariaDB and on the basic SQL not on the custom SQL used by MySQL/MariaDB/or other DB Systems like my beloved PostreSQL.

    --> TL: DR please skip to "Why using UUIDs?"

    You are right, an INTEGER PRIMARY KEY would be faster as seen in this link, but not as slow as you think:
    https://cjsavage.com/guides/mysql/insert-perf-uuid-vs-ordered-uuid-vs-int-pk.html

    While planning the database and libredex i was highly inspired by this link:
    https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/

    There you see the same, the UUID is not that slow compared to BIGINT, but you are right its slower.

    You may be right that old databases do not support this (see the second link is from 2003 so 15 years ago), thats why i force everyone to use an recent version of mariadb / mysql.
    https://dev.mysql.com/doc/refman/5.6/en/column-indexes.html - As you see here you can index strings/blobs if they are fixed size.

    Indeed there is an real problem if the key not sequential, but the way the uuid is reordered minimizes this impact.

    -> Arguing with such old thread on technology is, thats is propably outdated. Warning an silly car comparison: I guess 2 years ago we would not have believed that 2018 there will be self driving cars (that kill people).


    This may be in vanilla sql, but not mysql sql or else this would fail while creating the database:
    PHP:
    CREATE TABLE IF NOT EXISTS `desolationredux`.`world` (
      `
    uuidBINARY(16NOT NULL,
    ...
      `
    modsTEXT NULL,
      
    PRIMARY KEY (`uuid`),
      
    UNIQUE INDEX `uuid_UNIQUE` (`uuidASC))
    ENGINE InnoDB;
    Same here, you are totaly right with SQLite, but we are using MySQL, or once i get to is PostgreSQL (which btw has an dedicated uuid datatype which is efficiently processed and i am going to use).

    Yeah this link is about IBMs Informix System not MySQL.

    I could continue but i guess you see where i am heading.

    Why using UUIDs?
    There are several reasons why to use UUIDs some of them by design where we want to head with the Framework APMS and some others because of ARMA.

    getting an identifier for an player, character, object ... is very very fast.
    If i would use in autoincrement Integer Key like you suggest the procedure would be create the objectdata -> libredex -> database -> wait until the database has added the data -> return the increment id to libredex -> return this id to arma -> add the id to the object -> spawn the object.
    While doing this the arma script engine is busy with waiting until every is done, even if we work async this is a huge performance penalty.

    With UUIDs it works the following:
    1. Arma asks libredex hey gimme a UUID, and libredex returns is very fast, its one of the fastest functions of libredex because it bypasses basically everything of its internal structure.
    2. Arma is instantly able to spawn the object as soon as it has everything prepared
    3. Now after it was spawned it gets send to libredex using an fire and forget approach - this means arma does not wait until libredex processed everything and the database added it... Here the risk is that you might never see when an error happens (duh...) but there nothing should get lost, and if it does there is a bug that needs to be isolated, for this we need debug output anyway so use the debug libredex which dumps anything to its log anyway
    So to sum up this topic using an UUID might be a bit slower for the database, but its a very very large performance gain if you know the limitations of arma script engine.

    "UUID primary keys aren't needed for big keyspaces, they are needed when you need multiple processes"
    Quoted from here: https://news.ycombinator.com/item?id=11863492
    APMS und therefor also DSR is planned to support multiple hosts, i guess you already heard of the world traveling feature we are preparing.
    For this feature to work we require several servers (does not matter if server == hardware or server == arma server, i will simply stick to server) to operate on one database. To make this possible we need the "fire and forget" approach where ever possible and to be able to do this we need the UUIDs.

    To reuse what i just wrote "create the objectdata -> libredex -> database -> wait until the database has added the data -> return in increment id to libredex -> return this id to arma -> add the id to the object -> spawn the object. " this would change in:
    "create the objectdata -> libredex -> leave the server and connect to another server maybe in another datacenter -> database -> wait until the database has added the data -> send the data back to the first datacenter -> return the increment id to libredex -> return this id to arma -> add the id to the object -> spawn the object. "

    Now imagine the performance hit for arma, when you try to spawn 100 objects with the old method like it was used in dayz. As per design the uuids, can be generated across multiple machines at the same time and still will be unique. They are unique across the machines, because the machine mac adress is part of the id and that the id is unique on one machine is guaranteed by using the default libarys provided by Microsoft for Windows and libuuid for Linux.

    Additionally there are circumstances where the actual unique autoincrement ids can become not so unique because it could happen that they are reused by MySQL: https://bugs.mysql.com/bug.php?id=199 (now i come with an old link sorry) - This is not an direct issue with libredex, because it would depent on how i code.

    So when scaling the problem is not really the data storage, but the generation of the identifier, if you want read this:
    https://coder.today/software-engineer-from-monolith-to-cloud-auto-increment-to-uuid-a62f92f387c4

    And throwing the ball back: "Why Auto Increment Is A Terrible Idea" :D
    https://www.clever-cloud.com/blog/engineering/2015/05/20/why-auto-increment-is-a-terrible-idea/

    Anyway it took me more then one hour to write this, and meanwhile i forgot some other reasons i wanted to write, i actually had more then these 2. But some final words: back then when i joined the mod, i spent one month on planning libredex and the database. I am not 100% satisfied with what i created, but using UUIDs is one of the best design ideas. I agree i maybe should have mapped the blob to an BIGINT to make it easyer to export and import via normal tools like HeidiSQL... but then again from my part APMS was never planned to be easy to use for inexperienced people. If you know how to use the HEX() function in SQL Statements then this is not an issue at all.
     
  3. moosemilker

    moosemilker Member

    Joined:
    Mar 1, 2018
    Messages:
    38
    Likes Received:
    9
    @Legodev Wow. This is an awesome reply. Thank you for taking the time to explain all of that detail about the database design. I soaked it all up and I wanted to reply yesterday, but the complexities of life are delayed me (aka: work + wife out of town + kids = no time to reply right now). :)

    The gist of my reply is this:
    There are some tough requirements you are adapting to, so the mod can be very flexible for this "world traveling" feature and its surrounding frameworks. That is quite impressive you have done so much research and back your implementation up with research and performance data. Well done.

    Side note:
    A lot of the references I initially brought up are just the basics of database design, memory usage and how data is organized, grouped, written to disk, searched, indexed, etc: Basically CRUD in a nutshell. I had to start somewhere with some references for the research and to talk about points people had previously spoken to.

    Yep. I use the HEX() function with MySql and your database. Easy.

    I may reply later with more content, but wanted to put something here, so you know your typing wasn't for nothing. I love discovering other people's inspirations... :D

    - Aaron
     
    Last edited: Apr 27, 2018
  4. Legodev

    Legodev Developer Staff Member

    Joined:
    Jul 18, 2016
    Messages:
    394
    Likes Received:
    137
    Nice to see that someone went thru and read my text.

    to other library you might know libredex is even more flexible while at the same time being much more restrictive:
    • libredex theoretical supports loading in plugin dlls that follow my api, right now this is not enabled but the api itself is used to implements every advanced functionality like:
      • the database
      • random list generation
      • file access
      • time access
    • to make expanding libredex, maybe one day via additional dlls, as easy as possible libredex implements an abstraction layer for callExtension that arma should provide with:
      • three types of different call patterns
      1. sync call - this omits the whole internal caching etc of libredex and is used for the uuid thing as well as every low level call like fetching the next part of an message - this should never been used for anything then the really basic functionality
      2. async call - simply calls that are stored in an queue and gives the caller an ticked where to get the answer once ready. - once an worker is free he fetches from this queue and executes the requested functionality, once done the return value is saved under the previous handed ticket - this is to be used for every read operation
      3. quiet call - the same as async call but there is no ticket generated and the return value is simply discarded (except for the debug libredex where it gets dumped to the log) - this is to be used for every write operation
      • functions can be called by an human readable name
        • the functioname and the functionpointer are saved as an red/black tree which makes the call by name much faster then most other solutions did and much more flexible
        • once i enable the load dll feature, everything the dlls have to do is to provide an init function that returns the functioname -> function pointer pairs which are then added to this tree. Then they traded the same as internal functions. These functions of course have to follow my signature of the existing functions
          Code:
          std::string class::functionName(std::string &extFunction, ext_arguments &extArgument)
      • parameters getting packed into an data structure (extArgument) which gets forwarded to the called function. the only thing i require is that the parameters are named -> like it is possible in languages like python where you can call an function like:
        Code:
        getObject(owner="BlueDog", objectType="Hat")
        - while some might think this is annoying to create key-value pairs, this makes everything easyer because:
      1. its easyer to debug via the debug log because you see what you are looking at
      2. you are position independent, it does not matter if you send "classname, health, items" or "items, health, classname" in the functions you simply ask this package to give you the information you want like:
        Code:
        std::string listName = extArguments.get<std::string>("listName");
        unsigned int itemAmount = extArguments.get<unsigned int>("itemAmount");
      3. depenting on what you want to do with your function, you can define parameters as optional. So you can make a function context sensitive and you don't have to implement the same functions a hundred times for every possible permutation of the parameters. This leads to the next point.
    • libredex also supports partial informations, meaning for some functions only an minimum set of informations is required. - for example for updating an object the object uuid is required everything else is optional - you only want to update the position of an vehicle? go ahead and only send the position data.
      • the final goal of this is to later speed up object updates even more because serialising the data to send it via the callExtension api is whats actually slow in this whole system. if we manage to only serialize and send the data we actually need, this makes the server much faster.
    while theoretical being expandable in its functionality libredex is limited to a certain amount of database operations
    • they are design that way, that basically everything one might want to do, is possible to achieve using the existing functionality
      • if someone does not find a way to store the data he wants to store, i am always willing to help people to find out how to store it.
      • if even i don't find out how to solve this i am willing to expand libredex my self to add a way to store the data
      • i spent the most time of the design to think about all possibilitys people might want and designed the database according to this.
        • this is the reason why the database is that large and may be very confusing for most people because it never was intended to be used by hand with tools like phpmyadmin
        • instead it is directed towards accessing it with customs tools i still plan to build or simple routines for cleanup that execute statements like the ones pinned to #server_support in the desolation discord.
    • the reason why i do not allow to add more database functionality, meaning making it as flexible as extDB is simple:
    1. this removes the need for users of a plugin to install additional dlls which are potential dangerous
    2. it improves compatibility, as long as two plugins do not use the same name for variables, which is not possible because if they follow our recommendation and use the pluginname as part of the variable they want to store, its impossible to create conflicts.
    3. it makes installing plugins a lot easyer, because since the database is not altered, you don't need to execute any sql commands, add tables or columns ...
    4. people can't make mistaked with their statements and potentionally allow people to fuck with the whole database if the database access for the arma server is not correctly isolated
    the only issue is that former mods trained people to do everything that i don't want to allow them do to like:
    • touching the database with phpmyadmin, navicat or some other tool
    • introduce custom tables and stuff
    • ...
    this is what confuses people the most, that the whole idea about APMS is to force people to think in another direction which at the end is much easyer but people first have to learn to think different.

    Most critic we got in the past was exactly about this fact, that DSR (and so APMS) is shit because:
    • libredex is to complicated: use extDB!!!!!!111111 this works well and everyone knows how to work with it
    • the database is shit why so much tables? why not simply delete objects on their death instead of the stupid shit with killInfos or move them to another table? Why UUIDs nobody can work with them...
    • the Mod is shit because Mod XYZ does not work out of the box
    • the Plugins are silly "I want to simply add 3 lines to init.sqf, why do i have to put so much work in there and create an plugin for this? ..."

    All this critic is basically because people do not understand what we want to create. We don't want to create an easy to use DayZ Epoch/Origins. The whole zombie survival theme is just a cover-up to sell the system to the community, because DayZ SA is still ... and people still want Zombie Survival on ArmA that reminds them at the early DayZ Mod times. Thats boon and bane at the same time.
    Pro: Since DSR is only something like a vehicle to deliver our idea while being an test/development platform, there is no intention to become famous with Desolation REDUX or rich by selling servers. This allows us to target the mod to the few people who want the "true" feeling and don't need follow the herd with making everything easy to play with the big guys.
    Con: on the other hand exactly this is our biggest problem, since we don't want to become big by hook or by crook, its very hard to attract developers that want to help us bringing the mod and so the system forward. - its even hard for our self to keep the motivation up.

    Once people learned to think the way we want to them think because they want to expand Desolation REDUX, they will realize the beauty behind all of this and might start to use APMS the way it is intent:
    What we actually want is to create an framework (APMS) that allows you to easily develop new Mods without starting from 0. Thats why everything we create is directed towards devs not towards server owners, which results in things being overcomplicated for a Zombie Survival Mod. That again explains why people have to pass a certain level to get usefull answers from me and i become unfriendly when i see that the asking person may have some experience with hosting arma servers but has no background knowledge.

    The final goal is to have a framework where you can say:
    • i want something like altis life: then take the database plugin, the vehicle spawn plugin and create your self an sale/work plugin
    • i want an dayz battle royal: then take the zombie plugin, the vehicle spawn plugin, the melee system and add your self the zone shrinking
    • i want to create an stargate mod: take the server traveling, maybe the database plugin, design the worlds, place some enemys and have fun
    • i want desolation redux: then take desolation redux profile and let it install everything that is required and then add what ever you want like task force radio, traders or what ever
     
    moosemilker likes this.

Share This Page