The SIO2 project
  1. The SIO2 project
  2. SIO-2089

Investigate feasibility of filetracker compression and deduplication

    Details

    • Type: Improvement Improvement
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: TAG 2017/18 Sprint 2
    • Fix Version/s: None
    • Component/s: Filetracker
    • Labels:
      None

      Description

      Investigate possible options of compressing filetracker contents (different compression algorithms, compression/decompression logic, etc.). Also check if Szkopuł filetracker contains a significant number of duplicate files.

        Activity

        Hide
        Radosław Waśko added a comment -
        Used tools: https://gist.github.com/radeusgd/6119fa1528fc1fb0b1d26b287bd33db8

        Gain (maybe slightly misleading...) is the ratio of size after compression / dedup to the original size - lower is better.

        Deduplication (SHA):
        ('Highest count:', 14794 - there's a file that has 14794 exact instances)
        ('Total size:', 1101947332565)
        ('Dedup size:', 638640713188)
        ('Gain:', 57, '%')

        Gzip compression (50% of data analyzed):
        ('Original size:', 589263912900)
        ('Compressed size:', 236547224239)
        ('Gain:', 40, '%')

        Xz compression (slower, 30-40% analyzed):
        ('Original size:', 64855014576)
        ('Compressed size:', 19607237016)
        ('Gain:', 30, '%')
        Show
        Radosław Waśko added a comment - Used tools: https://gist.github.com/radeusgd/6119fa1528fc1fb0b1d26b287bd33db8 Gain (maybe slightly misleading...) is the ratio of size after compression / dedup to the original size - lower is better. Deduplication (SHA): ('Highest count:', 14794 - there's a file that has 14794 exact instances) ('Total size:', 1101947332565) ('Dedup size:', 638640713188) ('Gain:', 57, '%') Gzip compression (50% of data analyzed): ('Original size:', 589263912900) ('Compressed size:', 236547224239) ('Gain:', 40, '%') Xz compression (slower, 30-40% analyzed): ('Original size:', 64855014576) ('Compressed size:', 19607237016) ('Gain:', 30, '%')
        Hide
        Radosław Waśko added a comment -
        Also it seems that filetracker DELETE doesn't work, so that executables and outputs of user submissions that are not needed are present on the server.

        Eval folder takes up about 220GB of space and most likely can be deleted?
        Show
        Radosław Waśko added a comment - Also it seems that filetracker DELETE doesn't work, so that executables and outputs of user submissions that are not needed are present on the server. Eval folder takes up about 220GB of space and most likely can be deleted?

          People

          • Assignee:
            Szymon Acedański
            Reporter:
            Pavel Senchanka
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: