The SIO2 project
  1. The SIO2 project
  2. SIO-2089

Investigate feasibility of filetracker compression and deduplication

    Details

    • Type: Improvement Improvement
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: TAG 2017/18 Sprint 2
    • Fix Version/s: None
    • Component/s: Filetracker
    • Labels:
      None

      Description

      Investigate possible options of compressing filetracker contents (different compression algorithms, compression/decompression logic, etc.). Also check if Szkopuł filetracker contains a significant number of duplicate files.

        Activity

        Radosław Waśko made changes -
        Field Original Value New Value
        Assignee Radosław Waśko [ radeusgd ]
        Hide
        Radosław Waśko added a comment -
        Used tools: https://gist.github.com/radeusgd/6119fa1528fc1fb0b1d26b287bd33db8

        Gain (maybe slightly misleading...) is the ratio of size after compression / dedup to the original size - lower is better.

        Deduplication (SHA):
        ('Highest count:', 14794 - there's a file that has 14794 exact instances)
        ('Total size:', 1101947332565)
        ('Dedup size:', 638640713188)
        ('Gain:', 57, '%')

        Gzip compression (50% of data analyzed):
        ('Original size:', 589263912900)
        ('Compressed size:', 236547224239)
        ('Gain:', 40, '%')

        Xz compression (slower, 30-40% analyzed):
        ('Original size:', 64855014576)
        ('Compressed size:', 19607237016)
        ('Gain:', 30, '%')
        Show
        Radosław Waśko added a comment - Used tools: https://gist.github.com/radeusgd/6119fa1528fc1fb0b1d26b287bd33db8 Gain (maybe slightly misleading...) is the ratio of size after compression / dedup to the original size - lower is better. Deduplication (SHA): ('Highest count:', 14794 - there's a file that has 14794 exact instances) ('Total size:', 1101947332565) ('Dedup size:', 638640713188) ('Gain:', 57, '%') Gzip compression (50% of data analyzed): ('Original size:', 589263912900) ('Compressed size:', 236547224239) ('Gain:', 40, '%') Xz compression (slower, 30-40% analyzed): ('Original size:', 64855014576) ('Compressed size:', 19607237016) ('Gain:', 30, '%')
        Hide
        Radosław Waśko added a comment -
        Also it seems that filetracker DELETE doesn't work, so that executables and outputs of user submissions that are not needed are present on the server.

        Eval folder takes up about 220GB of space and most likely can be deleted?
        Show
        Radosław Waśko added a comment - Also it seems that filetracker DELETE doesn't work, so that executables and outputs of user submissions that are not needed are present on the server. Eval folder takes up about 220GB of space and most likely can be deleted?
        Radosław Waśko made changes -
        Status New [ 10000 ] Open [ 1 ]
        Radosław Waśko made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Radosław Waśko [ radeusgd ] Szymon Acedański [ accek ]
        Resolution Fixed [ 1 ]
        Szymon Acedański made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        New New Open Open
        6d 22h 43m 1 Radosław Waśko 2018-04-17 12:40
        Open Open Resolved Resolved
        49d 4h 22m 1 Radosław Waśko 2018-06-5 17:02
        Resolved Resolved Closed Closed
        294d 38m 1 Szymon Acedański 2019-03-26 16:41

          People

          • Assignee:
            Szymon Acedański
            Reporter:
            Pavel Senchanka
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: