Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor third-party libraries to build from source on Linux #5706

Merged
merged 1 commit into from
Aug 30, 2019

Conversation

alessandrogario
Copy link
Member

@alessandrogario alessandrogario commented Aug 16, 2019

This PR converts all the libraries required by osquery to source modules, by converting the Makefile-based build systems to CMake.

All the libraries have been rewritten, except for OpenSSL, which is implemented using a formula-based system similar to the old one we were using with brew.

Each library resides in a git submodule that is directly attached to a release tag on the upstream repository. Each submodule is fetched at configure-time in a lazy way, in response to the find_package(library_name) CMake call.

The new dependency system implements "layers"; they are configurable, and determine which library is fetched from where. We currently have three layers implemented: source, formula, facebook. The provided order will be used when looking for libraries.

Copy link
Member Author

@alessandrogario alessandrogario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best approach to make sure the config files are a good starting point is to make sure there're as few -dev packages installed on the Ubuntu box.

@Smjert and I will probably re-generate them using the custom toolchain and then make a diff and look at what changes.

azure-pipelines.yml Outdated Show resolved Hide resolved
inputs:
workingDirectory: $(Build.BinariesDirectory)/build
cmakeArgs: --build . --target format_check
# Temporary; this is disabled because it complains about the config.h files
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to enable this back on, but we should prevent generated config files from causing this check to fail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the easiest way is to add an ignore folder option (if it's not already there) in the script that checks the format.
We can ignore the entire libraries folder and be done with it.
Though if we want to have it more flexible, we should also probably add an option to specify single files.


# The formula will be configured with the C_FLAGS and CXX_FLAGS parameters; we should
# forward them to the configure script when possible
message(WARNING "TODO: Use C_FLAGS")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not yet passing the compiler/linker flags to the openssl configure script. We receive them as C_FLAGS and CXX_FLAGS from the main osquery project

BUILD_COMMAND
"${CMAKE_COMMAND}" -E env CC="${CMAKE_C_COMPILER}"
"${CMAKE_COMMAND}" -E make_directory "${CMAKE_INSTALL_PREFIX}/etc/openssl" &&
$(MAKE) depend -j 10 &&
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a totally arbitrary value that we should make configurable and default to a sane value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to remove -j 10 right now with the use of $(MAKE).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this use of $(MAKE) within the BUILD_COMMAND property work with Ninja as the generator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formula system was implemented using custom commands that are triggered when the required library files are referenced. The build command is executed in a new environment and is detached from the main project. It should work in theory, but I didn't test it yet.

Copy link
Member

@Smjert Smjert Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change to make because $(MAKE) is only valid if the generator calling the command is make; that's what you use for coordinating with the jobserver and avoid to spawn too many jobs.
With the change it will spawn more jobs than you required when building osquery, but it shouldn't be too much of an issue, especially for building openssl.

function(opensslMain)
generateMetadataTargets()

if(NOT DEFINED thirdparty_zlib_INCLUDE_DIRS OR
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata exchange mechanism relies on the fact that if the required parameters are missing the configuration step will still succeed (without creating the formula target)

endif()

# Generate the formula runner that will actually build the libraries
# TODO(alessandro): Change it so we use a completely new environment
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should launch the build command with a completely clean environment

@Smjert
Copy link
Member

Smjert commented Aug 16, 2019

A series of things still to do:

  • Try to clone submodules with --depth 1 where possible, to speed up the process on the CI (Alessandro: enabled where it was easy to add; some remotes do not support shallow clones)
  • Do a last pass on all the libs to check that they only compile what's necessary (Stefano)
  • Finish to fix the Windows build failures
  • Remove duplicated third-party libraries directories
  • Fix build with glibc >= 2.28 (like Ubuntu 18.10)
  • Rebase on master branch (solving the conflict in the BUILD.md)
  • Try to clone only the necessary child submodules (as mentioned in Refactor third-party libraries to build from source on Linux #5706 (comment)), otherwise the CI configure phase is slow (Alessandro: Everything is now lazy-cloned as the project imports the libs. Shallow clones also help a bit)
  • Some tables fail to be found by sqlite (Alessandro)
  • Fix failing tests
  • Fix OpenSSL formula which fails to configure and is not taking all the necessary flags (Stefano: I'll make a hotfix, this is still a WIP)
  • Finish implementing the OpenSSL formula (Refactor third-party libraries to build from source on Linux #5706 (comment))
  • Implement a way, in the existing script, to skip formatting files in a folder, specifically the libraries one (if it's not already present, Refactor third-party libraries to build from source on Linux #5706 (comment))
  • Cleanup environment before building a formula (Refactor third-party libraries to build from source on Linux #5706 (comment))
  • Improve the way patching works so that it doesn't need a manual submodule deinit when submodules are updated.

Glibc 2.28 deprecated some macros and moved them into a different include if one really needs them:

* The macros 'major', 'minor', and 'makedev' are now only available from
  the header <sys/sysmacros.h>; not from <sys/types.h> or various other
  headers that happen to include <sys/types.h>.  These macros are rarely
  used, not part of POSIX nor XSI, and their names frequently collide with
  user code; see https://sourceware.org/bugzilla/show_bug.cgi?id=19239 for
  further explanation.

  <sys/sysmacros.h> is a GNU extension.  Portable programs that require
  these macros should first include <sys/types.h>, and then include
  <sys/sysmacros.h> if __GNU_LIBRARY__ is defined.

This shouldn't be a problem when using the custom toolchain, but if we also want to support building with system one, this has to be fixed.

@theopolis
Copy link
Member

theopolis commented Aug 19, 2019

Some overall notes

  1. If we are going to land some subset of these as rebase + merge (vs squash) then I ask that the 75434dc and f62420e be squashed together since f62420e is mostly moving content around.
  2. At the end of this merge into master can we make sure there are "fewest possible" top-level directories used to manage dependencies. ./third-party and ./libraries are OK, any more is not OK, and if we can achieve simply ./libraries that is super-OK. (can we remove ./third-party-formula please)
  3. +1 to making sure CI still runs as quickly as possible (under 10mins would be great).

@@ -0,0 +1,210 @@
# Copyright (c) 2014-present, Facebook, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path libraries/cmake/facebook/modules/api.cmake doesn't make a lot of sense.

Alternatively, move this logic into /cmake/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this. Your call on where api and utils are located.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have stored the APIs there since they are layer-specific. Since the libraries folder (at least on the CMake side) is self-contained and not used directly by the main project, I ended up saving the file near the entry point (the find_package handlers).

We can move it elsewhere, keeping in mind however that the FindPackage script should keep using relative paths

BUILD_COMMAND
"${CMAKE_COMMAND}" -E env CC="${CMAKE_C_COMPILER}"
"${CMAKE_COMMAND}" -E make_directory "${CMAKE_INSTALL_PREFIX}/etc/openssl" &&
$(MAKE) depend -j 10 &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to remove -j 10 right now with the use of $(MAKE).

BUILD_COMMAND
"${CMAKE_COMMAND}" -E env CC="${CMAKE_C_COMPILER}"
"${CMAKE_COMMAND}" -E make_directory "${CMAKE_INSTALL_PREFIX}/etc/openssl" &&
$(MAKE) depend -j 10 &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this use of $(MAKE) within the BUILD_COMMAND property work with Ninja as the generator?

@theopolis
Copy link
Member

The approach is great, thanks for flushing this out! My inline comments are low-pri. Take a look at the 1,2,3 points I made in the overall PR comment. If they are reasonably addressed then I am good-to-go on this!

@Smjert
Copy link
Member

Smjert commented Aug 19, 2019

Some overall notes

  1. If we are going to land some subset of these as rebase + merge (vs squash) then I ask that the 75434dc and f62420e be squashed together since f62420e is mostly moving content around.
  2. At the end of this merge into master can we make sure there are "fewest possible" top-level directories used to manage dependencies. ./third-party and ./libraries are OK, any more is not OK, and if we can achieve simply ./libraries that is super-OK. (can we remove ./third-party-formula please)
  3. +1 to making sure CI still runs as quickly as possible (under 10mins would be great).
  1. I think we will squash everything, separating per author where possible.
  2. I've added it to the TODO checklist, there's some unwanted duplication right now.
  3. The build was never under 10mins on the CI, this is more a hardware problem. That been said in the TODO checklist there are a couple of things which can speed up the slower configure phase.

@Smjert
Copy link
Member

Smjert commented Aug 20, 2019

I peeked briefly at the errors and I think these come from SleuthKit using an embedded (auto) version of SQLite, which is outdated. The solution is to change SleuthKit's configuration to use an existing install of SQLite (the one included in the build) and set that version of SQLite as a dependency for SleuthKit.

Thanks for having a look at it.
I saw that too but it should not be compiled already, but I'll check again.
Also, as a last resort I went back using all prebuilt libraries, but the issue is still there, I'll investigate more later.

@alessandrogario
Copy link
Member Author

Do a last pass on all the libs to check that they only compile what's necessary (Stefano)

I think this should be low-pri. If we can get the new dependency management working then we can limit what is built in follow up PRs. I think resolving in follow ups is better as it expands the audience of end-to-end testers.

It's a matter of correctness, there were symbols collisions because the libraries were not made of library code only but they had also program/CLI code.
The tests were not running because many libraries were incorrectly providing a main().
I've already did a pass in commit ffbb682

Expanding more on Stefano's answers: I captured the .cpp/.c files in the configure-based libraries while trying to remove the most obvious main() instances (when they were under folders like tests/tools/samples) but some of them still remained when I created the PR (my initial goal was simple and I just wanted for the project to compile!)

@Smjert
Copy link
Member

Smjert commented Aug 20, 2019

I peeked briefly at the errors and I think these come from SleuthKit using an embedded (auto) version of SQLite, which is outdated. The solution is to change SleuthKit's configuration to use an existing install of SQLite (the one included in the build) and set that version of SQLite as a dependency for SleuthKit.

So you were right, @alessandrogario fixed that.
Also the error I saw in creating the virtual table in attachTableInternal is another issue I've just discovered which is not related to this PR, so I'll open an issue later

@theopolis
Copy link
Member

Sorry about the noise but you'll need to rebase one more time for changes to BUILD, I incorporated the relevant changes from this stack. But those did not include any documentation related to the new third-party management. We can choose to document this change in a follow-up diff or stack it here too.

.gitmodules Outdated Show resolved Hide resolved
.gitmodules Outdated Show resolved Hide resolved
.gitmodules Outdated Show resolved Hide resolved
.gitmodules Outdated Show resolved Hide resolved
.gitmodules Outdated Show resolved Hide resolved
@theopolis
Copy link
Member

Though indeed when pulling that's not acceptable... but solving this properly is a bit tricky, I think the other solution is to always copy the unpatched submodules into the build folder and patch them there and have CMake refer to that.
I'll try something..

We can iterate on this in follow up PRs as well.

@theopolis
Copy link
Member

FYI, I was trying to speed up the checkout in my last commit to this branch. Feel free to overwrite/replace/remove if it messed something up or there are fundamental issues with the changes.

@Smjert
Copy link
Member

Smjert commented Aug 25, 2019

I've rebased on master and fixed a thing, otherwise even the previous build would fail.

@Smjert
Copy link
Member

Smjert commented Aug 26, 2019

For the only test that's failing, it seems that zlib is not using the correct CRC32 implementation for gzip.
The header, the compressed data and part of the end is all equal to what's expected, there are only 4 bytes different which are the CRC32.
I've also double checked the RFC, which has a reference implementation for the CRC32 it expects and the result matches with what's expected by the test.
I think there might be some header collision, I'm trying to track it down.

@Smjert
Copy link
Member

Smjert commented Aug 26, 2019

So the issue is a bit tricky:
util-linux and zlib both export a crc32 symbol. I have no idea why the linker doesn't complain for multiple definitions (EDIT: probably --whole-archive is needed to detect those), though the thing is that when it should take the zlib implementation it actually uses the util-linux one which yields another result.
I've also noticed that the crc32 symbol was also exported by the old pre-built libraries, but this is definitely an issue.

I tried to avoid to compile that symbol, but it's used by libblkid/src/superblocks/superblocks.c and libblkid/src/superblocks/nilfs.c and especially the first cannot be removed.
So the only remaining solution for me is to patch the library and rename the crc32 function in something that doesn't collide with zlib.
Any other suggestion?

@theopolis
Copy link
Member

Interesting, can we use linker flags instead of patching? I’m not sure what flags or if this is possible, but we should look.

@Smjert
Copy link
Member

Smjert commented Aug 27, 2019

Interesting, can we use linker flags instead of patching? I’m not sure what flags or if this is possible, but we should look.

I've checked, using the preprocessor you can pass -Dcrc32=utillinux_crc32 and the symbol will be renamed, including the usages.
Also, the crc32.h header is not included by any other header of the library, which means that we simply don't share it and everything should work.

@theopolis
Copy link
Member

Great progress, how are we doing on the list of TODOs? Should we squash everything into a small commit set?

@theopolis
Copy link
Member

A single commit might be good too, or at least a very small set, to make bisecting easier.

@Smjert
Copy link
Member

Smjert commented Aug 27, 2019

A single commit might be good too, or at least a very small set, to make bisecting easier.

If possible we wanted to preserve the contribution of everyone, even though to be honest I'm not really sure how much is possible, given that our commits are mixed and sometimes they directly depends on each other.

I've updated the TODO list for what there's still to do, I'll update it again if I missed something.
I didn't put it there, but we also have to take into account that this was the first step to use the custom toolchain.
I've worked with the custom toolchain building in parallel, which is currently in a repo I own, but I was blocked in verifying its correctness beyond compiling some simple examples.
Today I quickly tested it with osquery and making a couple of changes to a copy of this branch I've managed to build osquery on Ubuntu 18.10 and make it run on CentOS 6.
That been said there is still something to do:

  • Regenerate every config.h of the third party libraries and derive the correct defines to pass
  • Some different flags are needed to compile with the custom toolchain
  • Be sure that headers are not taken from the system, but only from the project or the sysroot the toolchain provides
  • Ideally we support only the custom toolchain for now, then we can think of a way to switch between system and custom provided toolchain (even though imho we can only support two options, because compile flags have to be changed to compile osquery and third party libraries when changing toolchain, which is not easy if impossible to derive automatically).
  • Make the necessary changes to CPack so that it requires the correct packages.

@theopolis theopolis added cmake pure cmake changes libraries For things referring to osquery third party libraries labels Aug 29, 2019
Add a way to compile third-party libraries from source instead of downloading prebuilt ones.
Each library source code is downloaded with git into a submodule at configure time,
in response to the find_package(library_name) CMake call,
except for OpenSSL where the official source archive is used.
Each submodule is attached to a release tag on its own upstream repository.
All the libraries are built using CMake directly, except for OpenSSL which uses a formula system,
which permits to build libraries with a separate build system
when there's no easy way to integrate it directly with CMake.

This new dependency system determines which library is fetched from where using the concept of "layers".
Currently we have three of them: source, formula, facebook,
where the last layer represents the pre-built libraries.
The provided order will be used when looking for libraries.

A system to patch submodule source code has been added and it's currently used with googletest, libudev and util-linux.
Patches should be put under libraries/cmake/source/<library name>/patches/<submodule>,
where <submodule> is often one and is "src", but in other cases, like AWS,
there are multiple with a more specific name.
If for whatever reason the submodule cloning or the patching fails,
the submodule has to be unregistered and its folder should be cleared.
This should be achievable with "git submodule deinit -f <submodule path>"

Following some other changes on existing functionality:

- Changed the CMake variable BUILD_TESTING to OSQUERY_BUILD_TESTS
  to avoid enabling tests on third party libraries.
  Due to an issue with glog the BUILD_TESTING variable
  will be always forced to OFF.
- Moved compiler and linker flags to their own file cmake/flags.cmake
- Moved all the third-party CMakeLists.txt used for pre-built libraries under libraries/cmake/facebook
- Added the --exclude-folders option to tools/format-check.py and tools/git-clang-format.py,
  so that it's possible to ignore any third party library source code.
- The format and format_check target use the new --exclude-folders option
  to exclude libraries/cmake/source from formatting.
- The test and osquery binaries are properly compiled with PIE (#5611)

Co-authored-by: Stefano Bonicatti <stefano.bonicatti@gmail.com>
Co-authored-by: Teddy Reed <teddy@casualhacking.io>
@theopolis theopolis changed the title Thirdparty submodules Refactor third-party libraries to build from source on Linux Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmake pure cmake changes libraries For things referring to osquery third party libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants