Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-portable generated code for size_t #258

Open
jlblancoc opened this issue May 11, 2023 · 18 comments
Open

Non-portable generated code for size_t #258

jlblancoc opened this issue May 11, 2023 · 18 comments

Comments

@jlblancoc
Copy link
Contributor

Problem

// C++ API to be wrapped:
size_t foo();

Generates code replacing size_t with unsigned long. Which is OK... until you try to build the generated sources in another platform (e.g. armhf, etc.) where size_t becomes something else, so the generated pattern code doesn't match anymore the original C++ API and fails to build. A solution might be re-running binder for each architecture, but I would like to avoid it.

Questions

  • Within binder, where is the type size_t and such converted into their underlying types?
  • What would you think about a new config option to "freeze" certain types such that they are not resolved into their underlying types (if this is even possible...)

If you give me some pointers, I would be glad to attempt to implement it and PR.

@CStewart-Burger
Copy link

CStewart-Burger commented May 11, 2023

I just came here for the same thing. I'm not particularly familiar with clang, but I think you'd want to look at https://github.com/RosettaCommons/binder/blob/67a9fe429698d752dc336122995f838bdeb7f92d/source/function.cpp#LL245C5-L245C5.
I've also noticed some other argument types not being parsed as I'd expect. @jlblancoc , have you tested other stdint types?

@jlblancoc
Copy link
Contributor Author

Thanks for the pointer! I'll try to craft something around that function...

, have you tested other stdint types?

sure, the same happen for examples for uint8_t which gets mapped into unsigned char, but that one is fairly standard and no problematic, but that's not always the case :-(

@lyskov
Copy link
Member

lyskov commented May 12, 2023

@jlblancoc unfortunately in general this is expected behavior and right now i could not offer you any workarounds for solving it. For my main project i have to give on idea of generating binding on one platform and compiling it on the other due to this issue. In short: this issue is due to how LLVM/Clang inner handling of the types is implemented, after function is parsed it types reported in "resolved" form so all information about typedefs is lost.

It has been a while when there was last time i checked on this issue. I will take another look in case newer LLVM versions allows to extract original type info but I think chances of this are slim.

Questions

  • Within binder, where is the type size_t and such converted into their underlying types?

-- this conversion is happened inside LLVM/Clang layer so this is not something in our control.

  • What would you think about a new config option to "freeze" certain types such that they are not resolved into their underlying types (if this is even possible...)

-- similar ideas was considered in the past. Problem with this approach is that we have no way to determine which type instances of unsigned long should be converted to size_t and which should not.

@lyskov
Copy link
Member

lyskov commented May 12, 2023

@jlblancoc , i just took a fresh look at the code and new API docs and i think there might be a way to get original type information! Let me play with this and i will get back to you.

@jlblancoc
Copy link
Contributor Author

Thanks, it would be great! In the meanwhile I tried with tricks redefining __SIZE_TYPE__ and alike, but that's a dead end...

@lyskov
Copy link
Member

lyskov commented May 16, 2023

@jlblancoc - it turned out that general solution for this issue might require a bit more research. So for now i found a decent workaround that we can use but it will require listing all std:: types that we confident could be back-converted. Right now i added only std::size_t but it should be possible to add correct handling for any non-template types. So could you please try to pull from recent master and see if this works for you? And if/when you discover similar error try adding appropriate std type into this list: 30e53cc#diff-098881659b3bdd75b9b57555bb70247c118325422929347527db13c1e67bd0faR471 - PR with updated list will be welcome 😬 - as i mentioned this will currently only work for non template types. Thanks,

@jlblancoc
Copy link
Contributor Author

Sergey, this is definitively awesome, thanks! 👏👏

Sure, I'll try it and expand in a PR after testing...

Cheers

@jlblancoc
Copy link
Contributor Author

PS: it would add a certain performance loss, but if you prefer it, I could also add an optional set of types loadable from the config file, in addition to the standard std types I will add.

@lyskov
Copy link
Member

lyskov commented May 16, 2023

PS: it would add a certain performance loss, but if you prefer it, I could also add an optional set of types loadable from the config file, in addition to the standard std types I will add.

-- that would be great, PR adding this functionality will be welcome!

Re possible performance hit: - i would not worry about this. We can make sure that standard_names set is only initialized from options once. For instance the most straightforward way to achieve this would be to refactor line 30e53cc#diff-098881659b3bdd75b9b57555bb70247c118325422929347527db13c1e67bd0faR471 to something like this:

static std::set<string> standard_names;
if( standard_names.empty() ) {
  standard_names.insert( {"std::size_t"} );
  <add names from config options here>
}

jlblancoc added a commit to MRPT/mrpt that referenced this issue May 16, 2023
@jlblancoc
Copy link
Contributor Author

@lyskov After your fix, all cl.def(..) cases are fixed regarding size_t and friends, that's all good.

But now, the same problem remains in the auto-generated wrappers PyCallBack_*, see for example this where "size_t" has been converted to "unsigned long", even after the latest commits.

I'll try to take a look at binder code to see where those wrappers are generated, but you would probably have it already in your head and provide a faster fix (?).

@jlblancoc jlblancoc reopened this May 16, 2023
@jlblancoc
Copy link
Contributor Author

Maybe here?

https://github.com/RosettaCommons/binder/blob/master/source/class.cpp#L701-L720

@lyskov
Copy link
Member

lyskov commented May 17, 2023

-- right! Binding code generation for member functions will needs to be updated as will. - I will looks this up soon!

@lyskov
Copy link
Member

lyskov commented May 17, 2023

OK, i just pushed the patch, - @jlblancoc please see if it works for you! Thanks,

@jlblancoc
Copy link
Contributor Author

Thanks for the quick fix @lyskov !

I tested it and indeed it seems to have worked well. I'm waiting for packages to be built in armhf to fully confirm it works now...

@jlblancoc
Copy link
Contributor Author

My package still have one remaining error for armhf but it's a too particular problem in my API... binder made the grade here too, thanks!! 💯

@jlblancoc
Copy link
Contributor Author

I found one more edge case:

	void push_back(const double v) { internalPushBack(v); }
	void push_back(const std::string v) { internalPushBack(v); }
	void push_back(const uint64_t v) { internalPushBack(v); }

got converted into:

		cl.def("push_back", (void (mrpt::containers::yaml::*)(const double)) &mrpt::containers::yaml::push_back, "Append a new value to a sequence.\n \n\n std::exception If this is not a sequence \n\nC++: mrpt::containers::yaml::push_back(const double) --> void", pybind11::arg("v"));
		cl.def("push_back", (void (mrpt::containers::yaml::*)(const std::string)) &mrpt::containers::yaml::push_back, "C++: mrpt::containers::yaml::push_back(const std::string) --> void", pybind11::arg("v"));
		cl.def("push_back", (void (mrpt::containers::yaml::*)(const unsigned long)) &mrpt::containers::yaml::push_back, "C++: mrpt::containers::yaml::push_back(const unsigned long) --> void", pybind11::arg("v"));

Note the replacement of uint64_t to unsigned long, as it was before...
Removing the "const" fixes it:

	void push_back(uint64_t v) { internalPushBack(v); }

gives the correct uint64_t:

+               cl.def("push_back", (void (mrpt::containers::yaml::*)(uint64_t)) &mrpt::containers::yaml::push_back, "C++: mrpt::containers::yaml::push_back(uint64_t) --> void", pybind11::arg("v"));

Maybe related to string matching in types?

For now I'll drop the "const" on my side, but it's not expected ideal behavior...

@jlblancoc jlblancoc reopened this May 18, 2023
@lyskov
Copy link
Member

lyskov commented May 18, 2023

Thank you for detailed examples @jlblancoc ! Let me think about this for a bit. Clearly more general solution is needed here. I have a few ideas to try and experiment and will post here if/when i have something better then the current code.

@jlblancoc
Copy link
Contributor Author

jlblancoc commented Sep 5, 2023

For the records, I "solved" this in my project via a mass generation of wrapping code via binder, then applying hand-written patches to the problematic parts. Not ideal, but it's really hard to handle all these usages of variable-size types in a portable way.

PS: Link to relevant "hacky" parts in my project, I hope it can help others:
https://github.com/MRPT/mrpt/blob/develop/python/generate-python.sh#L69-L97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants