[Buildroot] [PATCH] support/scripts/cve.py: switch to NVD JSON version 2.0

Daniel Lang dalang at gmx.at
Tue Aug 1 14:13:03 UTC 2023


Hello Thomas,

On 31.07.23 23:52, Thomas Petazzoni via buildroot wrote:
> Hello Daniel,
>
> On Mon, 31 Jul 2023 22:14:20 +0200
> Daniel Lang <dalang at gmx.at> wrote:
>
>> The currently used feed is deprecated and will be retired by NVD in
>> September 2023 [0].
>> The new API returns up to 2000 CVEs every 5 seconds (without API key) [1].
>> Instead of request individual years as with the feed, one can specify
>> two timestamps are range. Any CVE changed in this time is returned.
>> Therefore every single CVE is stored in a seperate JSON file.
>> All fields returned by the API are saved for future use.
>> This results in over 200000 files grouped by year with ~800MiB total.
>>
>> [0]: https://nvd.nist.gov/General/News/change-timeline
>> [1]: https://nvd.nist.gov/developers/start-here
>>
>> Signed-off-by: Daniel Lang <dalang at gmx.at>
>
> Wow, thanks for working on this! Is the storing of 200k files workable,
> or do we need to consider some other option like a local sqlite
> database or something?

From testing on my system I can say that it seems to be workable.
Generating pkg-stats for all packages takes roughly the same time

old: ./support/scripts/pkg-stats --html old.html --nvd-path dl/buildroot-nvd/ --disable url,upstream,cpe   252,39s user 45,10s system 100% cpu 4:54,85 total
new: ./support/scripts/pkg-stats --html new.html --nvd-path dl/buildroot-nvd/ --disable url,upstream,cpe   250,04s user 46,24s system 100% cpu 4:53,72 total

I did consider a sqlite database given that that's the approach yocto uses.
In the end I decided against it as I wasn't sure how future proof it would be.
The current approach means that additional information (score, description,...)
could be added or used for other purposes without having to download again.
Whereas I thought I had to make a selection for the database.
In hindsight I could have just added a column for every information available.

If there is concern I can see with I have the time to also implement a database
approach for comparison.

Not sure if updating would be faster with a database. It takes ~1.5 seconds
on my system to save the batch of 2k CVEs to file. But I guess the main bottleneck
is the API given that the initial download took upwards of 30 minutes during my
test runs and only ~2.5 minutes are spend creating files.

>
> Another question: did you do a run of "make pkg-stats" before and after
> your patch to compare the results in terms of CVEs reported for each
> Buildroot package?

I did. For a 1:1 comparison the sorting on line 185 has to be changed to
for cve_file in sorted(os.listdir(year_folder)):
Otherwise CVEs within a package are sorted differently making a comparison
very hard.
Running pkg-stats with this change generates identical reports:

diff old.html new.html
57505c57505
< <p><i>Updated on 2023-08-01 07:34:14.594976, git commit 22e476d7886163484d233803b42a2a4c2b588a5b</i></p>
---
> <p><i>Updated on 2023-08-01 08:40:33.290711, git commit 22e476d7886163484d233803b42a2a4c2b588a5b</i></p>

Additional information about sorting:
Currently the sorting done by NVD when creating the tar.gz is used.
NVDs sorts CVE-2023-10000 before CVE-2023-1000 which might not be ideal.
The new implementation adds custom sorting as we would otherwise
rely on the filesystem sorting.

>
> Thomas

One final note: I'm in no way a python expert, so any optimization or
general input is welcome.

Regards,
Daniel




More information about the buildroot mailing list