[Buildroot] [PATCH] support/scripts/cve.py: switch to NVD JSON version 2.0
Daniel Lang
dalang at gmx.at
Tue Aug 1 14:13:03 UTC 2023
Hello Thomas,
On 31.07.23 23:52, Thomas Petazzoni via buildroot wrote:
> Hello Daniel,
>
> On Mon, 31 Jul 2023 22:14:20 +0200
> Daniel Lang <dalang at gmx.at> wrote:
>
>> The currently used feed is deprecated and will be retired by NVD in
>> September 2023 [0].
>> The new API returns up to 2000 CVEs every 5 seconds (without API key) [1].
>> Instead of request individual years as with the feed, one can specify
>> two timestamps are range. Any CVE changed in this time is returned.
>> Therefore every single CVE is stored in a seperate JSON file.
>> All fields returned by the API are saved for future use.
>> This results in over 200000 files grouped by year with ~800MiB total.
>>
>> [0]: https://nvd.nist.gov/General/News/change-timeline
>> [1]: https://nvd.nist.gov/developers/start-here
>>
>> Signed-off-by: Daniel Lang <dalang at gmx.at>
>
> Wow, thanks for working on this! Is the storing of 200k files workable,
> or do we need to consider some other option like a local sqlite
> database or something?
From testing on my system I can say that it seems to be workable.
Generating pkg-stats for all packages takes roughly the same time
old: ./support/scripts/pkg-stats --html old.html --nvd-path dl/buildroot-nvd/ --disable url,upstream,cpe 252,39s user 45,10s system 100% cpu 4:54,85 total
new: ./support/scripts/pkg-stats --html new.html --nvd-path dl/buildroot-nvd/ --disable url,upstream,cpe 250,04s user 46,24s system 100% cpu 4:53,72 total
I did consider a sqlite database given that that's the approach yocto uses.
In the end I decided against it as I wasn't sure how future proof it would be.
The current approach means that additional information (score, description,...)
could be added or used for other purposes without having to download again.
Whereas I thought I had to make a selection for the database.
In hindsight I could have just added a column for every information available.
If there is concern I can see with I have the time to also implement a database
approach for comparison.
Not sure if updating would be faster with a database. It takes ~1.5 seconds
on my system to save the batch of 2k CVEs to file. But I guess the main bottleneck
is the API given that the initial download took upwards of 30 minutes during my
test runs and only ~2.5 minutes are spend creating files.
>
> Another question: did you do a run of "make pkg-stats" before and after
> your patch to compare the results in terms of CVEs reported for each
> Buildroot package?
I did. For a 1:1 comparison the sorting on line 185 has to be changed to
for cve_file in sorted(os.listdir(year_folder)):
Otherwise CVEs within a package are sorted differently making a comparison
very hard.
Running pkg-stats with this change generates identical reports:
diff old.html new.html
57505c57505
< <p><i>Updated on 2023-08-01 07:34:14.594976, git commit 22e476d7886163484d233803b42a2a4c2b588a5b</i></p>
---
> <p><i>Updated on 2023-08-01 08:40:33.290711, git commit 22e476d7886163484d233803b42a2a4c2b588a5b</i></p>
Additional information about sorting:
Currently the sorting done by NVD when creating the tar.gz is used.
NVDs sorts CVE-2023-10000 before CVE-2023-1000 which might not be ideal.
The new implementation adds custom sorting as we would otherwise
rely on the filesystem sorting.
>
> Thomas
One final note: I'm in no way a python expert, so any optimization or
general input is welcome.
Regards,
Daniel
More information about the buildroot
mailing list