D7796: rust-nodemap: input/output primitives

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
gracinet created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  These allow to initiate a `NodeTree` from an immutable opaque
  sequence of bytes, which could be passed over from Python
  (extracted from a `PyBuffer`) or directly mmapped from a file.
 
  Conversely, we can consume
  a `NodeTree`, extracting the bytes that express what
  has been added to the immutable part, together with the
  original immutable part.
  This gives callers the choice to start a new Nodetree.
  After writing to disk, some would prefer to reread for
  best guarantees (very cheap if mmapping), some others will
  find it more convenient to grow the memory that was considered
  immutable in the `NodeTree` and continue from there.
 
  In `load_bytes`, we anticipate a bit on the file format for
  the final version, allowing an offset for fixed data at the
  beginning of the file.
 
  This is enough to build examples running on real data and
  start gathering performance hints.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
     RevlogIndex,
 };
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -132,6 +134,8 @@
 #[derive(Clone, PartialEq)]
 pub struct Block([RawElement; 16]);
 
+pub const BLOCK_SIZE: usize = mem::size_of::<Block>();
+
 impl Block {
     fn new() -> Self {
         Block([-1; 16])
@@ -221,6 +225,57 @@
         }
     }
 
+    /// Create from an opaque bunch of bytes
+    ///
+    /// The created `NodeTreeBytes` is taken after the fixed `offset` from
+    /// `buffer`, of which exactly `amount` bytes are used.
+    ///
+    /// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+    /// - `offset` allows for the final file format to include fixed data
+    ///   (generation number, behavioural flags)
+    /// - `amount` is expressed in bytes, and is not automatically derived from
+    ///   `bytes`, so that a caller that manages them atomically can perform
+    ///   temporary disk serializations and still rollback easily if needed.
+    ///   First use-case for this would be to support Mercurial shell hooks.
+    ///
+    /// panics if `buffer` is smaller than `offset + amount`
+    pub fn load_bytes(
+        bytes: Box<dyn Deref<Target = [u8]> + Send>,
+        offset: usize,
+        amount: usize,
+    ) -> Self {
+        NodeTree::new(Box::new(NodeTreeBytes::new(bytes, offset, amount)))
+    }
+
+    /// Retrieve added `Block` and the original immutable data
+    pub fn into_readonly_and_added(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<Block>) {
+        let mut vec = self.growable;
+        let readonly = self.readonly;
+        if readonly.last() != Some(&self.root) {
+            vec.push(self.root);
+        }
+        (readonly, vec)
+    }
+
+    /// Retrieve added `Blocks` as bytes, ready to be written to persistent
+    /// storage
+    pub fn into_readonly_and_added_bytes(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<u8>) {
+        let (readonly, vec) = self.into_readonly_and_added();
+        let bytes = unsafe {
+            Vec::from_raw_parts(
+                vec.as_ptr() as *mut u8,
+                vec.len() * BLOCK_SIZE,
+                vec.capacity() * BLOCK_SIZE,
+            )
+        };
+        mem::forget(vec);
+        (readonly, bytes)
+    }
+
     /// Total number of blocks
     fn len(&self) -> usize {
         self.readonly.len() + self.growable.len() + 1
@@ -366,6 +421,42 @@
     }
 }
 
+pub struct NodeTreeBytes {
+    buffer: Box<dyn Deref<Target = [u8]> + Send>,
+    offset: usize,
+    len_in_blocks: usize,
+}
+
+impl NodeTreeBytes {
+    fn new(
+        buffer: Box<dyn Deref<Target = [u8]> + Send>,
+        offset: usize,
+        amount: usize,
+    ) -> Self {
+        assert!(buffer.len() >= offset + amount);
+        let len_in_blocks = amount / BLOCK_SIZE;
+        NodeTreeBytes {
+            buffer,
+            offset,
+            len_in_blocks,
+        }
+    }
+}
+
+impl Deref for NodeTreeBytes {
+    type Target = [Block];
+
+    fn deref(&self) -> &[Block] {
+        unsafe {
+            slice::from_raw_parts(
+                (&self.buffer).as_ptr().offset(self.offset as isize)
+                    as *const Block,
+                self.len_in_blocks,
+            )
+        }
+    }
+}
+
 struct NodeTreeVisitor<'n, 'p> {
     nt: &'n NodeTree,
     prefix: NodePrefixRef<'p>,
@@ -710,4 +801,31 @@
 
         Ok(())
     }
+
+    #[test]
+    fn test_into_added_empty() {
+        assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
+        assert!(sample_nodetree()
+            .into_readonly_and_added_bytes()
+            .1
+            .is_empty());
+    }
+
+    #[test]
+    fn test_into_added_bytes() -> Result<(), NodeMapError> {
+        let mut idx = TestNtIndex::new();
+        idx.insert(0, "1234")?;
+        let mut idx = idx.commit();
+        idx.insert(4, "cafe")?;
+        let (_, bytes) = idx.nt.into_readonly_and_added_bytes();
+
+        // only the root block has been changed
+        assert_eq!(bytes.len(), BLOCK_SIZE);
+        // big endian for -2
+        assert_eq!(&bytes[4..2 * 4], [255, 255, 255, 254]);
+        // big endian for -6
+        assert_eq!(&bytes[12 * 4..13 * 4], [255, 255, 255, 250]);
+        Ok(())
+    }
+
 }



To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
gracinet updated this revision to Diff 19137.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7796?vs=19045&id=19137

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
     RevlogIndex,
 };
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -132,6 +134,8 @@
 #[derive(Clone, PartialEq)]
 pub struct Block([RawElement; 16]);
 
+pub const BLOCK_SIZE: usize = mem::size_of::<Block>();
+
 impl Block {
     fn new() -> Self {
         Block([-1; 16])
@@ -221,6 +225,57 @@
         }
     }
 
+    /// Create from an opaque bunch of bytes
+    ///
+    /// The created `NodeTreeBytes` is taken after the fixed `offset` from
+    /// `buffer`, of which exactly `amount` bytes are used.
+    ///
+    /// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+    /// - `offset` allows for the final file format to include fixed data
+    ///   (generation number, behavioural flags)
+    /// - `amount` is expressed in bytes, and is not automatically derived from
+    ///   `bytes`, so that a caller that manages them atomically can perform
+    ///   temporary disk serializations and still rollback easily if needed.
+    ///   First use-case for this would be to support Mercurial shell hooks.
+    ///
+    /// panics if `buffer` is smaller than `offset + amount`
+    pub fn load_bytes(
+        bytes: Box<dyn Deref<Target = [u8]> + Send>,
+        offset: usize,
+        amount: usize,
+    ) -> Self {
+        NodeTree::new(Box::new(NodeTreeBytes::new(bytes, offset, amount)))
+    }
+
+    /// Retrieve added `Block` and the original immutable data
+    pub fn into_readonly_and_added(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<Block>) {
+        let mut vec = self.growable;
+        let readonly = self.readonly;
+        if readonly.last() != Some(&self.root) {
+            vec.push(self.root);
+        }
+        (readonly, vec)
+    }
+
+    /// Retrieve added `Blocks` as bytes, ready to be written to persistent
+    /// storage
+    pub fn into_readonly_and_added_bytes(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<u8>) {
+        let (readonly, vec) = self.into_readonly_and_added();
+        let bytes = unsafe {
+            Vec::from_raw_parts(
+                vec.as_ptr() as *mut u8,
+                vec.len() * BLOCK_SIZE,
+                vec.capacity() * BLOCK_SIZE,
+            )
+        };
+        mem::forget(vec);
+        (readonly, bytes)
+    }
+
     /// Total number of blocks
     fn len(&self) -> usize {
         self.readonly.len() + self.growable.len() + 1
@@ -366,6 +421,42 @@
     }
 }
 
+pub struct NodeTreeBytes {
+    buffer: Box<dyn Deref<Target = [u8]> + Send>,
+    offset: usize,
+    len_in_blocks: usize,
+}
+
+impl NodeTreeBytes {
+    fn new(
+        buffer: Box<dyn Deref<Target = [u8]> + Send>,
+        offset: usize,
+        amount: usize,
+    ) -> Self {
+        assert!(buffer.len() >= offset + amount);
+        let len_in_blocks = amount / BLOCK_SIZE;
+        NodeTreeBytes {
+            buffer,
+            offset,
+            len_in_blocks,
+        }
+    }
+}
+
+impl Deref for NodeTreeBytes {
+    type Target = [Block];
+
+    fn deref(&self) -> &[Block] {
+        unsafe {
+            slice::from_raw_parts(
+                (&self.buffer).as_ptr().offset(self.offset as isize)
+                    as *const Block,
+                self.len_in_blocks,
+            )
+        }
+    }
+}
+
 struct NodeTreeVisitor<'n, 'p> {
     nt: &'n NodeTree,
     prefix: NodePrefixRef<'p>,
@@ -710,4 +801,30 @@
 
         Ok(())
     }
+
+    #[test]
+    fn test_into_added_empty() {
+        assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
+        assert!(sample_nodetree()
+            .into_readonly_and_added_bytes()
+            .1
+            .is_empty());
+    }
+
+    #[test]
+    fn test_into_added_bytes() -> Result<(), NodeMapError> {
+        let mut idx = TestNtIndex::new();
+        idx.insert(0, "1234")?;
+        let mut idx = idx.commit();
+        idx.insert(4, "cafe")?;
+        let (_, bytes) = idx.nt.into_readonly_and_added_bytes();
+
+        // only the root block has been changed
+        assert_eq!(bytes.len(), BLOCK_SIZE);
+        // big endian for -2
+        assert_eq!(&bytes[4..2 * 4], [255, 255, 255, 254]);
+        // big endian for -6
+        assert_eq!(&bytes[12 * 4..13 * 4], [255, 255, 255, 250]);
+        Ok(())
+    }
 }



To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
gracinet updated this revision to Diff 19329.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7796?vs=19137&id=19329

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
     RevlogIndex,
 };
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -132,6 +134,8 @@
 #[derive(Clone, PartialEq)]
 pub struct Block([RawElement; 16]);
 
+pub const BLOCK_SIZE: usize = mem::size_of::<Block>();
+
 impl Block {
     fn new() -> Self {
         Block([-1; 16])
@@ -219,6 +223,57 @@
         }
     }
 
+    /// Create from an opaque bunch of bytes
+    ///
+    /// The created `NodeTreeBytes` is taken after the fixed `offset` from
+    /// `buffer`, of which exactly `amount` bytes are used.
+    ///
+    /// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+    /// - `offset` allows for the final file format to include fixed data
+    ///   (generation number, behavioural flags)
+    /// - `amount` is expressed in bytes, and is not automatically derived from
+    ///   `bytes`, so that a caller that manages them atomically can perform
+    ///   temporary disk serializations and still rollback easily if needed.
+    ///   First use-case for this would be to support Mercurial shell hooks.
+    ///
+    /// panics if `buffer` is smaller than `offset + amount`
+    pub fn load_bytes(
+        bytes: Box<dyn Deref<Target = [u8]> + Send>,
+        offset: usize,
+        amount: usize,
+    ) -> Self {
+        NodeTree::new(Box::new(NodeTreeBytes::new(bytes, offset, amount)))
+    }
+
+    /// Retrieve added `Block` and the original immutable data
+    pub fn into_readonly_and_added(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<Block>) {
+        let mut vec = self.growable;
+        let readonly = self.readonly;
+        if readonly.last() != Some(&self.root) {
+            vec.push(self.root);
+        }
+        (readonly, vec)
+    }
+
+    /// Retrieve added `Blocks` as bytes, ready to be written to persistent
+    /// storage
+    pub fn into_readonly_and_added_bytes(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<u8>) {
+        let (readonly, vec) = self.into_readonly_and_added();
+        let bytes = unsafe {
+            Vec::from_raw_parts(
+                vec.as_ptr() as *mut u8,
+                vec.len() * BLOCK_SIZE,
+                vec.capacity() * BLOCK_SIZE,
+            )
+        };
+        mem::forget(vec);
+        (readonly, bytes)
+    }
+
     /// Total number of blocks
     fn len(&self) -> usize {
         self.readonly.len() + self.growable.len() + 1
@@ -364,6 +419,42 @@
     }
 }
 
+pub struct NodeTreeBytes {
+    buffer: Box<dyn Deref<Target = [u8]> + Send>,
+    offset: usize,
+    len_in_blocks: usize,
+}
+
+impl NodeTreeBytes {
+    fn new(
+        buffer: Box<dyn Deref<Target = [u8]> + Send>,
+        offset: usize,
+        amount: usize,
+    ) -> Self {
+        assert!(buffer.len() >= offset + amount);
+        let len_in_blocks = amount / BLOCK_SIZE;
+        NodeTreeBytes {
+            buffer,
+            offset,
+            len_in_blocks,
+        }
+    }
+}
+
+impl Deref for NodeTreeBytes {
+    type Target = [Block];
+
+    fn deref(&self) -> &[Block] {
+        unsafe {
+            slice::from_raw_parts(
+                (&self.buffer).as_ptr().offset(self.offset as isize)
+                    as *const Block,
+                self.len_in_blocks,
+            )
+        }
+    }
+}
+
 struct NodeTreeVisitor<'n, 'p> {
     nt: &'n NodeTree,
     prefix: NodePrefixRef<'p>,
@@ -708,4 +799,30 @@
 
         Ok(())
     }
+
+    #[test]
+    fn test_into_added_empty() {
+        assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
+        assert!(sample_nodetree()
+            .into_readonly_and_added_bytes()
+            .1
+            .is_empty());
+    }
+
+    #[test]
+    fn test_into_added_bytes() -> Result<(), NodeMapError> {
+        let mut idx = TestNtIndex::new();
+        idx.insert(0, "1234")?;
+        let mut idx = idx.commit();
+        idx.insert(4, "cafe")?;
+        let (_, bytes) = idx.nt.into_readonly_and_added_bytes();
+
+        // only the root block has been changed
+        assert_eq!(bytes.len(), BLOCK_SIZE);
+        // big endian for -2
+        assert_eq!(&bytes[4..2 * 4], [255, 255, 255, 254]);
+        // big endian for -6
+        assert_eq!(&bytes[12 * 4..13 * 4], [255, 255, 255, 250]);
+        Ok(())
+    }
 }



To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
This revision now requires changes to proceed.
kevincox added inline comments.
kevincox requested changes to this revision.

INLINE COMMENTS

> nodemap.rs:268
> +            Vec::from_raw_parts(
> +                vec.as_ptr() as *mut u8,
> +                vec.len() * BLOCK_SIZE,

This makes me uncomfortable. There is really no guarantee that there is no padding around `Block` or `RawElement`. If there is any padding this is UB. I would much prefer one of:

1. Store growable as a set of bytes and convert on get/set.
2. Add a method that outputs the bytes to any `Write`. In theory this will be slower but it is probably immaterial (especially if there is no padding).

> nodemap.rs:272
> +            )
> +        };
> +        mem::forget(vec);

If someone panics between the `from_raw_parts` and `mem::forget` this is a double free. Right now I think this can't happen but it is completely possible that the code changes between then and now. I would prefer using `Vec::from_raw_parts` to make sure that there is only one `Vec` alive with that pointer at a time. This risks a leak in the face of panic but I think that is much better.

> nodemap.rs:433
> +        amount: usize,
> +    ) -> Self {
> +        assert!(buffer.len() >= offset + amount);

This is a weird API. Why does new take a buffer and an offset? Can it take either a slice or just remove the offset parameter and always have the buffer start at 0? It is very strange to be passing in data that we own but isn't ever used.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
kevincox added inline comments.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:268
> This makes me uncomfortable. There is really no guarantee that there is no padding around `Block` or `RawElement`. If there is any padding this is UB. I would much prefer one of:
>
> 1. Store growable as a set of bytes and convert on get/set.
> 2. Add a method that outputs the bytes to any `Write`. In theory this will be slower but it is probably immaterial (especially if there is no padding).

I thought about this more and I think I am okay doing it this way. It seems like this should be well defined as long as there is no padding. However on that note I would want to add a check that there is no padding as expected. I would also like to ensure that this fails to compile if there is ever padding, even in release mode. I think this can be accomplished by something like:

  let _: [u8; 4 * BLOCK_SIZE] = std::mem::transmute([Block::new(); 4]);

I would probably want to repeat this check near any code that is relying on this invariant.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
gracinet added a comment.


  @kevincox (replying hastily because I'm also finishing something unrelated today).
  Doesn't `mem::size_of` guarantee to take any padding into account? At least that's what the doc seems to say: https://doc.rust-lang.org/std/mem/fn.size_of.html

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
kevincox added a comment.


  Ah yes. So my test would be ineffective.
 
  The problem that IIUC padding is not guaranteed to be initialized, and in rust all values need to be initialized. So if you cast a type with padding to a `u8` it is undefined behaviour.
 
  So we would need a test that shows that the type is the same size as the sum of the contained `i32`s.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
kevincox added a comment.


  Maybe what we should do is keep the test I proposed, but change the definition of `BLOCK_SIZE` to be based upon the number of contained `i32` instead of `size_of`.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
gracinet edited the summary of this revision.
gracinet updated this revision to Diff 19636.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7796?vs=19329&id=19636

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
 };
 
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -175,6 +177,8 @@
 #[derive(Clone, PartialEq)]
 pub struct Block([RawElement; 16]);
 
+pub const BLOCK_SIZE: usize = mem::size_of::<Block>();
+
 impl Block {
     fn new() -> Self {
         Block([-1; 16])
@@ -262,6 +266,56 @@
         }
     }
 
+    /// Create from an opaque bunch of bytes
+    ///
+    /// The created `NodeTreeBytes` from `buffer`,
+    /// of which exactly `amount` bytes are used.
+    ///
+    /// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+    /// - `offset` allows for the final file format to include fixed data
+    ///   (generation number, behavioural flags)
+    /// - `amount` is expressed in bytes, and is not automatically derived from
+    ///   `bytes`, so that a caller that manages them atomically can perform
+    ///   temporary disk serializations and still rollback easily if needed.
+    ///   First use-case for this would be to support Mercurial shell hooks.
+    ///
+    /// panics if `buffer` is smaller than `amount`
+    pub fn load_bytes(
+        bytes: Box<dyn Deref<Target = [u8]> + Send>,
+        amount: usize,
+    ) -> Self {
+        NodeTree::new(Box::new(NodeTreeBytes::new(bytes, amount)))
+    }
+
+    /// Retrieve added `Block` and the original immutable data
+    pub fn into_readonly_and_added(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<Block>) {
+        let mut vec = self.growable;
+        let readonly = self.readonly;
+        if readonly.last() != Some(&self.root) {
+            vec.push(self.root);
+        }
+        (readonly, vec)
+    }
+
+    /// Retrieve added `Blocks` as bytes, ready to be written to persistent
+    /// storage
+    pub fn into_readonly_and_added_bytes(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<u8>) {
+        let (readonly, vec) = self.into_readonly_and_added();
+        let bytes = unsafe {
+            Vec::from_raw_parts(
+                vec.as_ptr() as *mut u8,
+                vec.len() * BLOCK_SIZE,
+                vec.capacity() * BLOCK_SIZE,
+            )
+        };
+        mem::forget(vec);
+        (readonly, bytes)
+    }
+
     /// Total number of blocks
     fn len(&self) -> usize {
         self.readonly.len() + self.growable.len() + 1
@@ -410,6 +464,38 @@
     }
 }
 
+pub struct NodeTreeBytes {
+    buffer: Box<dyn Deref<Target = [u8]> + Send>,
+    len_in_blocks: usize,
+}
+
+impl NodeTreeBytes {
+    fn new(
+        buffer: Box<dyn Deref<Target = [u8]> + Send>,
+        amount: usize,
+    ) -> Self {
+        assert!(buffer.len() >= amount);
+        let len_in_blocks = amount / BLOCK_SIZE;
+        NodeTreeBytes {
+            buffer,
+            len_in_blocks,
+        }
+    }
+}
+
+impl Deref for NodeTreeBytes {
+    type Target = [Block];
+
+    fn deref(&self) -> &[Block] {
+        unsafe {
+            slice::from_raw_parts(
+                (&self.buffer).as_ptr() as *const Block,
+                self.len_in_blocks,
+            )
+        }
+    }
+}
+
 struct NodeTreeVisitor<'n, 'p> {
     nt: &'n NodeTree,
     prefix: NodePrefixRef<'p>,
@@ -786,4 +872,30 @@
 
         Ok(())
     }
+
+    #[test]
+    fn test_into_added_empty() {
+        assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
+        assert!(sample_nodetree()
+            .into_readonly_and_added_bytes()
+            .1
+            .is_empty());
+    }
+
+    #[test]
+    fn test_into_added_bytes() -> Result<(), NodeMapError> {
+        let mut idx = TestNtIndex::new();
+        idx.insert(0, "1234")?;
+        let mut idx = idx.commit();
+        idx.insert(4, "cafe")?;
+        let (_, bytes) = idx.nt.into_readonly_and_added_bytes();
+
+        // only the root block has been changed
+        assert_eq!(bytes.len(), BLOCK_SIZE);
+        // big endian for -2
+        assert_eq!(&bytes[4..2 * 4], [255, 255, 255, 254]);
+        // big endian for -6
+        assert_eq!(&bytes[12 * 4..13 * 4], [255, 255, 255, 250]);
+        Ok(())
+    }
 }



To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
gracinet added inline comments.
gracinet marked 2 inline comments as done.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:268
> I thought about this more and I think I am okay doing it this way. It seems like this should be well defined as long as there is no padding. However on that note I would want to add a check that there is no padding as expected. I would also like to ensure that this fails to compile if there is ever padding, even in release mode. I think this can be accomplished by something like:
>
>   let _: [u8; 4 * BLOCK_SIZE] = std::mem::transmute([Block::new(); 4]);
>
> I would probably want to repeat this check near any code that is relying on this invariant.

About a method to output to a writer: for the time being, we're avoiding doing I/O directly in Rust because most of it is shared with Python through layers that perform various sanitizations, lock management etc. That's why the simplest is to transfer bytes out.

Context note: the data structure (`Block`) is directly taken from the C version, which is non-persistent, but I believe that these 64 bytes are just some low-level programmer reflex. It's probably not a coincidence that IIRC that's also the length of cache lines on most current CPUs.

Anyway, the point of all of this is to get on disk without padding, so yes, we could implement our own non-padding by changing the definition of `Block` to `[u8; 64]`. In the end, we are forced to trust what's on disk anyway.

> kevincox wrote in nodemap.rs:272
> If someone panics between the `from_raw_parts` and `mem::forget` this is a double free. Right now I think this can't happen but it is completely possible that the code changes between then and now. I would prefer using `Vec::from_raw_parts` to make sure that there is only one `Vec` alive with that pointer at a time. This risks a leak in the face of panic but I think that is much better.

Not sure to understand what you suggest here, since I'm already using `Vec::from_raw_parts` in that method. I couldn't find an `into_raw_parts`.
Anyway, the official example <https://doc.rust-lang.org/std/vec/struct.Vec.html#examples-4>  has the `mem::forget` before  `Vec::from_raw_parts`. Would that be sufficient?

I agree that a leak upon some exceptional-cannot-happen condition is better than a double free.

Also, forgot to came back to that one in the latest phab update

> kevincox wrote in nodemap.rs:433
> This is a weird API. Why does new take a buffer and an offset? Can it take either a slice or just remove the offset parameter and always have the buffer start at 0? It is very strange to be passing in data that we own but isn't ever used.

Owning some memory and not using the whole of it is anyway what happens when that memory region is just obtained from a mmap (which is the whole point of carrying theses `Deref` all around).
Technically, in a mmap, I suppose we actually only own the file descriptor and the resulting pointer, but we can't just convert it to an inert slice.

Anyway it's now confirmed that we won't be needing the offset finally, so I've removed it.  Some data at the end of the bytes slice may be ignored, too : it would be the result of aborted transactions.

Note: the Python layer will maintain the amount of validated blocks in a separate small file which is updated atomically. Any future reimplementation of this in Rust would have to do the same.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
marmoute added a comment.


  I don't see anything wrong in this changeset, but there have been a lots of exchange between @kevincox and @gracinet. @kevincox Is there still code you are unconfortable with?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
marmoute updated this revision to Diff 20025.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7796?vs=19636&id=20025

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
 };
 
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -175,6 +177,8 @@
 #[derive(Clone, PartialEq)]
 pub struct Block([RawElement; 16]);
 
+pub const BLOCK_SIZE: usize = mem::size_of::<Block>();
+
 impl Block {
     fn new() -> Self {
         Block([-1; 16])
@@ -262,6 +266,56 @@
         }
     }
 
+    /// Create from an opaque bunch of bytes
+    ///
+    /// The created `NodeTreeBytes` from `buffer`,
+    /// of which exactly `amount` bytes are used.
+    ///
+    /// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+    /// - `offset` allows for the final file format to include fixed data
+    ///   (generation number, behavioural flags)
+    /// - `amount` is expressed in bytes, and is not automatically derived from
+    ///   `bytes`, so that a caller that manages them atomically can perform
+    ///   temporary disk serializations and still rollback easily if needed.
+    ///   First use-case for this would be to support Mercurial shell hooks.
+    ///
+    /// panics if `buffer` is smaller than `amount`
+    pub fn load_bytes(
+        bytes: Box<dyn Deref<Target = [u8]> + Send>,
+        amount: usize,
+    ) -> Self {
+        NodeTree::new(Box::new(NodeTreeBytes::new(bytes, amount)))
+    }
+
+    /// Retrieve added `Block` and the original immutable data
+    pub fn into_readonly_and_added(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<Block>) {
+        let mut vec = self.growable;
+        let readonly = self.readonly;
+        if readonly.last() != Some(&self.root) {
+            vec.push(self.root);
+        }
+        (readonly, vec)
+    }
+
+    /// Retrieve added `Blocks` as bytes, ready to be written to persistent
+    /// storage
+    pub fn into_readonly_and_added_bytes(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<u8>) {
+        let (readonly, vec) = self.into_readonly_and_added();
+        let bytes = unsafe {
+            Vec::from_raw_parts(
+                vec.as_ptr() as *mut u8,
+                vec.len() * BLOCK_SIZE,
+                vec.capacity() * BLOCK_SIZE,
+            )
+        };
+        mem::forget(vec);
+        (readonly, bytes)
+    }
+
     /// Total number of blocks
     fn len(&self) -> usize {
         self.readonly.len() + self.growable.len() + 1
@@ -410,6 +464,38 @@
     }
 }
 
+pub struct NodeTreeBytes {
+    buffer: Box<dyn Deref<Target = [u8]> + Send>,
+    len_in_blocks: usize,
+}
+
+impl NodeTreeBytes {
+    fn new(
+        buffer: Box<dyn Deref<Target = [u8]> + Send>,
+        amount: usize,
+    ) -> Self {
+        assert!(buffer.len() >= amount);
+        let len_in_blocks = amount / BLOCK_SIZE;
+        NodeTreeBytes {
+            buffer,
+            len_in_blocks,
+        }
+    }
+}
+
+impl Deref for NodeTreeBytes {
+    type Target = [Block];
+
+    fn deref(&self) -> &[Block] {
+        unsafe {
+            slice::from_raw_parts(
+                (&self.buffer).as_ptr() as *const Block,
+                self.len_in_blocks,
+            )
+        }
+    }
+}
+
 struct NodeTreeVisitor<'n, 'p> {
     nt: &'n NodeTree,
     prefix: NodePrefixRef<'p>,
@@ -786,4 +872,30 @@
 
         Ok(())
     }
+
+    #[test]
+    fn test_into_added_empty() {
+        assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
+        assert!(sample_nodetree()
+            .into_readonly_and_added_bytes()
+            .1
+            .is_empty());
+    }
+
+    #[test]
+    fn test_into_added_bytes() -> Result<(), NodeMapError> {
+        let mut idx = TestNtIndex::new();
+        idx.insert(0, "1234")?;
+        let mut idx = idx.commit();
+        idx.insert(4, "cafe")?;
+        let (_, bytes) = idx.nt.into_readonly_and_added_bytes();
+
+        // only the root block has been changed
+        assert_eq!(bytes.len(), BLOCK_SIZE);
+        // big endian for -2
+        assert_eq!(&bytes[4..2 * 4], [255, 255, 255, 254]);
+        // big endian for -6
+        assert_eq!(&bytes[12 * 4..13 * 4], [255, 255, 255, 250]);
+        Ok(())
+    }
 }



To: gracinet, #hg-reviewers, kevincox
Cc: marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
This revision now requires changes to proceed.
durin42 added a comment.
durin42 requested changes to this revision.


  I'm just not comfortable with the `unsafe` block here, especially with the comments from @kevincox . That said, if the `unsafe` can't disappear, it probably needs a pretty thorough explanation of why it's correct for a novice rust reader to follow?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox, durin42
Cc: marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
Alphare added a comment.


  @kevincox To re-iterate, I've taken over this series, so excuse any additional inaccuracies.

INLINE COMMENTS

> gracinet wrote in nodemap.rs:268
> About a method to output to a writer: for the time being, we're avoiding doing I/O directly in Rust because most of it is shared with Python through layers that perform various sanitizations, lock management etc. That's why the simplest is to transfer bytes out.
>
> Context note: the data structure (`Block`) is directly taken from the C version, which is non-persistent, but I believe that these 64 bytes are just some low-level programmer reflex. It's probably not a coincidence that IIRC that's also the length of cache lines on most current CPUs.
>
> Anyway, the point of all of this is to get on disk without padding, so yes, we could implement our own non-padding by changing the definition of `Block` to `[u8; 64]`. In the end, we are forced to trust what's on disk anyway.

@kevincox @durin42: I've changed the code to use the type-level `ManuallyDrop` instead of `mem::forget`, added the `transmute` to be extra paranoid and added a comment.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox, durin42
Cc: Alphare, marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
Alphare updated this revision to Diff 20215.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7796?vs=20025&id=20215

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
 };
 
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -172,9 +174,11 @@
 /// represented at all, because we want an immutable empty nodetree
 /// to be valid.
 
-#[derive(Clone, PartialEq)]
+#[derive(Copy, Clone, PartialEq)]
 pub struct Block([RawElement; 16]);
 
+pub const BLOCK_SIZE: usize = mem::size_of::<Block>();
+
 impl Block {
     fn new() -> Self {
         Block([-1; 16])
@@ -262,6 +266,65 @@
         }
     }
 
+    /// Create from an opaque bunch of bytes
+    ///
+    /// The created `NodeTreeBytes` from `buffer`,
+    /// of which exactly `amount` bytes are used.
+    ///
+    /// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+    /// - `offset` allows for the final file format to include fixed data
+    ///   (generation number, behavioural flags)
+    /// - `amount` is expressed in bytes, and is not automatically derived from
+    ///   `bytes`, so that a caller that manages them atomically can perform
+    ///   temporary disk serializations and still rollback easily if needed.
+    ///   First use-case for this would be to support Mercurial shell hooks.
+    ///
+    /// panics if `buffer` is smaller than `amount`
+    pub fn load_bytes(
+        bytes: Box<dyn Deref<Target = [u8]> + Send>,
+        amount: usize,
+    ) -> Self {
+        NodeTree::new(Box::new(NodeTreeBytes::new(bytes, amount)))
+    }
+
+    /// Retrieve added `Block` and the original immutable data
+    pub fn into_readonly_and_added(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<Block>) {
+        let mut vec = self.growable;
+        let readonly = self.readonly;
+        if readonly.last() != Some(&self.root) {
+            vec.push(self.root);
+        }
+        (readonly, vec)
+    }
+
+    /// Retrieve added `Blocks` as bytes, ready to be written to persistent
+    /// storage
+    pub fn into_readonly_and_added_bytes(
+        self,
+    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<u8>) {
+        let (readonly, vec) = self.into_readonly_and_added();
+        // Prevent running `v`'s destructor so we are in complete control
+        // of the allocation.
+        let vec = mem::ManuallyDrop::new(vec);
+
+        let bytes = unsafe {
+            // This is safe because we check at compile-time that there is no
+            // padding.
+            // /!\ Any use of `vec` after this is use-after-free.
+
+            let _: [u8; 4 * BLOCK_SIZE] =
+                std::mem::transmute([Block::new(); 4]);
+            Vec::from_raw_parts(
+                vec.as_ptr() as *mut u8,
+                vec.len() * BLOCK_SIZE,
+                vec.capacity() * BLOCK_SIZE,
+            )
+        };
+        (readonly, bytes)
+    }
+
     /// Total number of blocks
     fn len(&self) -> usize {
         self.readonly.len() + self.growable.len() + 1
@@ -410,6 +473,38 @@
     }
 }
 
+pub struct NodeTreeBytes {
+    buffer: Box<dyn Deref<Target = [u8]> + Send>,
+    len_in_blocks: usize,
+}
+
+impl NodeTreeBytes {
+    fn new(
+        buffer: Box<dyn Deref<Target = [u8]> + Send>,
+        amount: usize,
+    ) -> Self {
+        assert!(buffer.len() >= amount);
+        let len_in_blocks = amount / BLOCK_SIZE;
+        NodeTreeBytes {
+            buffer,
+            len_in_blocks,
+        }
+    }
+}
+
+impl Deref for NodeTreeBytes {
+    type Target = [Block];
+
+    fn deref(&self) -> &[Block] {
+        unsafe {
+            slice::from_raw_parts(
+                (&self.buffer).as_ptr() as *const Block,
+                self.len_in_blocks,
+            )
+        }
+    }
+}
+
 struct NodeTreeVisitor<'n, 'p> {
     nt: &'n NodeTree,
     prefix: NodePrefixRef<'p>,
@@ -787,4 +882,30 @@
 
         Ok(())
     }
+
+    #[test]
+    fn test_into_added_empty() {
+        assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
+        assert!(sample_nodetree()
+            .into_readonly_and_added_bytes()
+            .1
+            .is_empty());
+    }
+
+    #[test]
+    fn test_into_added_bytes() -> Result<(), NodeMapError> {
+        let mut idx = TestNtIndex::new();
+        idx.insert(0, "1234")?;
+        let mut idx = idx.commit();
+        idx.insert(4, "cafe")?;
+        let (_, bytes) = idx.nt.into_readonly_and_added_bytes();
+
+        // only the root block has been changed
+        assert_eq!(bytes.len(), BLOCK_SIZE);
+        // big endian for -2
+        assert_eq!(&bytes[4..2 * 4], [255, 255, 255, 254]);
+        // big endian for -6
+        assert_eq!(&bytes[12 * 4..13 * 4], [255, 255, 255, 250]);
+        Ok(())
+    }
 }



To: gracinet, #hg-reviewers, kevincox, durin42
Cc: Alphare, marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
kevincox added inline comments.

INLINE COMMENTS

> Alphare wrote in nodemap.rs:268
> @kevincox @durin42: I've changed the code to use the type-level `ManuallyDrop` instead of `mem::forget`, added the `transmute` to be extra paranoid and added a comment.

This doesn't address the primary problem of padding. There is nothing in the code that guarantees that there will be no padding inserted into `Block` (although it is unlikely to ever happen). This is because if there is padding it won't be guaranteed to be initialized and reading uninitialized data is undefined behaviour (not to mention that it will give incorrect results). I would like to add something that guarantees this. The only thing I can think of that will give us this confidence is a check `sizeof::<Block>() == 4 * 16`. I was suggesting the transmute to assert this (since transpose checks the size of its type arguments are the same. However as I originally suggested and as currently written it doesn't do that.

My current proposal is change `BLOCK_SIZE` to be defined as `4 * 16` (possibly extracting that 16 into a `ELEMENTS_PER_BLOCK`) and keep the current transmute assertion. That paired with some comments explaining why we are doing that will make me confident that there is no padding and we aren't performing any undefined behaviour.

> gracinet wrote in nodemap.rs:272
> Not sure to understand what you suggest here, since I'm already using `Vec::from_raw_parts` in that method. I couldn't find an `into_raw_parts`.
> Anyway, the official example <https://doc.rust-lang.org/std/vec/struct.Vec.html#examples-4>  has the `mem::forget` before  `Vec::from_raw_parts`. Would that be sufficient?
>
> I agree that a leak upon some exceptional-cannot-happen condition is better than a double free.
>
> Also, forgot to came back to that one in the latest phab update

Ah, is missed that into_raw_parts is nightly-only at the moment. I think the forget is sufficient for now. Maybe leave a TODO to use into_raw_parts once stabilized.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox, durin42
Cc: Alphare, marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: D7796: rust-nodemap: input/output primitives

Yuya Nishihara
In reply to this post by mharbison72 (Matt Harbison)
> +    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<u8>) {
> +        let (readonly, vec) = self.into_readonly_and_added();
> +        // Prevent running `v`'s destructor so we are in complete control
> +        // of the allocation.
> +        let vec = mem::ManuallyDrop::new(vec);
> +
> +        let bytes = unsafe {
> +            // This is safe because we check at compile-time that there is no
> +            // padding.
> +            // /!\ Any use of `vec` after this is use-after-free.
> +
> +            let _: [u8; 4 * BLOCK_SIZE] =
> +                std::mem::transmute([Block::new(); 4]);
> +            Vec::from_raw_parts(
> +                vec.as_ptr() as *mut u8,
> +                vec.len() * BLOCK_SIZE,
> +                vec.capacity() * BLOCK_SIZE,
> +            )

Appears that this is unsafe. The doc states that the source type must have the
exact same alignment as `Vec<u8>` probably because the allocator may use
separate bucket per alignment.

https://doc.rust-lang.org/std/vec/struct.Vec.html#method.from_raw_parts

"It's also not safe to build one from a Vec<u16> and its length, because the
allocator cares about the alignment, and these two types have different alignments."

Can't we instead implement `as_bytes() -> &[u8]`?
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
yuja added a comment.


  > +    ) -> (Box<dyn Deref<Target = [Block]> + Send>, Vec<u8>) {
  > +        let (readonly, vec) = self.into_readonly_and_added();
  > +        // Prevent running `v`'s destructor so we are in complete control
  > +        // of the allocation.
  > +        let vec = mem::ManuallyDrop::new(vec);
  > +
  > +        let bytes = unsafe {
  > +            // This is safe because we check at compile-time that there is no
  > +            // padding.
  > +            // /!\ Any use of `vec` after this is use-after-free.
  > +
  > +            let _: [u8; 4 * BLOCK_SIZE] =
  > +                std::mem::transmute([Block::new(); 4]);
  > +            Vec::from_raw_parts(
  > +                vec.as_ptr() as *mut u8,
  > +                vec.len() * BLOCK_SIZE,
  > +                vec.capacity() * BLOCK_SIZE,
  > +            )
 
  Appears that this is unsafe. The doc states that the source type must have the
  exact same alignment as `Vec<u8>` probably because the allocator may use
  separate bucket per alignment.
 
  https://doc.rust-lang.org/std/vec/struct.Vec.html#method.from_raw_parts
 
  "It's also not safe to build one from a Vec<u16> and its length, because the
  allocator cares about the alignment, and these two types have different alignments."
 
  Can't we instead implement `as_bytes() -> &[u8]`?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox, durin42
Cc: yuja, Alphare, marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
Alphare added inline comments.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:268
> This doesn't address the primary problem of padding. There is nothing in the code that guarantees that there will be no padding inserted into `Block` (although it is unlikely to ever happen). This is because if there is padding it won't be guaranteed to be initialized and reading uninitialized data is undefined behaviour (not to mention that it will give incorrect results). I would like to add something that guarantees this. The only thing I can think of that will give us this confidence is a check `sizeof::<Block>() == 4 * 16`. I was suggesting the transmute to assert this (since transpose checks the size of its type arguments are the same. However as I originally suggested and as currently written it doesn't do that.
>
> My current proposal is change `BLOCK_SIZE` to be defined as `4 * 16` (possibly extracting that 16 into a `ELEMENTS_PER_BLOCK`) and keep the current transmute assertion. That paired with some comments explaining why we are doing that will make me confident that there is no padding and we aren't performing any undefined behaviour.

I think that would be sufficient indeed. How about making `Block` into a `[u8; 64]`? It will be less expressive, but this will no longer be needed.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox, durin42
Cc: yuja, Alphare, marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D7796: rust-nodemap: input/output primitives

mharbison72 (Matt Harbison)
In reply to this post by mharbison72 (Matt Harbison)
kevincox added inline comments.

INLINE COMMENTS

> Alphare wrote in nodemap.rs:268
> I think that would be sufficient indeed. How about making `Block` into a `[u8; 64]`? It will be less expressive, but this will no longer be needed.

I like that. It makes it much more explicit that it is a backing array and forces us to have proper endiness hygiene.

I'm also willing to assume that `[u8; 64]` will be 64 bytes with no padding forever.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox, durin42
Cc: yuja, Alphare, marmoute, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
12